[FAKE] GMM IC PR for comment by bdpedigo · Pull Request #43 · neurodata/scikit-learn

bdpedigo · 2023-05-30T14:32:34Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

bdpedigo · 2023-05-30T14:34:13Z

sklearn/mixture/_gaussian_mixture_ic.py

+    Different combinations of initialization, GMM,
+    and cluster numbers are used and the clustering
+    with the best selection criterion (BIC or AIC) is chosen.


suggest making this match LassoLarsIC a bit closer, eg "Such criteria are useful to select the value of the regularization parameter by making a trade-off between the goodness of fit and the complexity of the model." could basically replace "regularization parameter" with "gaussian mixture parameters"

bdpedigo · 2023-05-30T14:35:13Z

sklearn/mixture/_gaussian_mixture_ic.py

+    n_init : int, optional (default = 1)
+        If ``n_init`` is larger than 1, additional
+        ``n_init``-1 runs of :class:`sklearn.mixture.GaussianMixture`
+        initialized with k-means will be performed


not necessarily initialized with k-means, right?

bdpedigo · 2023-05-30T14:35:41Z

sklearn/mixture/_gaussian_mixture_ic.py

+        initialized with k-means will be performed
+        for all covariance parameters in ``covariance_type``.
+
+    init_params : {‘kmeans’ (default), ‘k-means++’, ‘random’, ‘random_from_data’}


perhaps worth explaining the options, mainly i dont know what random_from_data is from this description

also, is kmeans ++ not the default? if not, why not? i think it is in sklearn if i remember correctly

yeah, not sure, apparently kmeans is the default in GaussianMixture

bdpedigo · 2023-05-30T14:37:30Z

sklearn/mixture/_gaussian_mixture_ic.py

+
+    Attributes
+    ----------
+    best_criterion_ : float


lasso lars IC calls this "criterion_"

bdpedigo · 2023-05-30T14:38:59Z

sklearn/mixture/_gaussian_mixture_ic.py

+    covariance_type_ : str
+        Covariance type for the model with the best bic/aic.
+
+    best_model_ : :class:`sklearn.mixture.GaussianMixture`


in lassolarsIC, there is no "sub-object" with the best model; rather the whole class just operates as if it is that model. does that make sense? while i cant speak for them, my guess is this is closer to what they'd be expecting

I add the attributes like weights_, means_ from GaussianMixture into GaussianMixtureIC, but I found that I still need to save the best model (I call best_estimator_ in the newest version) in order to all predict. Did I understand you correctly?

bdpedigo · 2023-05-30T14:39:35Z

sklearn/mixture/_gaussian_mixture_ic.py

+    best_model_ : :class:`sklearn.mixture.GaussianMixture`
+        Object with the best bic/aic.
+
+    labels_ : array-like, shape (n_samples,)


not a property of GaussianMixture, recommend not storing

bdpedigo · 2023-05-30T14:40:50Z

sklearn/mixture/_gaussian_mixture_ic.py

+        self.criterion = criterion
+        self.n_jobs = n_jobs
+
+    def _check_multi_comp_inputs(self, input, name, default):


i usually make any methods that dont access self into functions

bdpedigo · 2023-05-30T14:41:55Z

sklearn/mixture/_gaussian_mixture_ic.py

+            name="min_components",
+            target_type=int,
+        )
+        check_scalar(


min value could be "min_components"?

bdpedigo · 2023-05-30T14:42:54Z

sklearn/mixture/_gaussian_mixture_ic.py

+        else:
+            criterion_value = model.aic(X)
+
+        # change the precision of "criterion_value" based on sample size


could you explain this?

bdpedigo · 2023-05-30T14:45:46Z

sklearn/mixture/_gaussian_mixture_ic.py

+        )
+        best_criter = [result.criterion for result in results]
+
+        if sum(best_criter == np.min(best_criter)) == 1:


this all seems fine but just a suggestion - https://numpy.org/doc/stable/reference/generated/numpy.argmin.html
docs imply that for ties, argmin gives the first. so in other words if results are sorted in order of complexity, just using argmin would do what you want. (can even leave a comment to this effect, if you go this route).

note that i think having the results sorted by complexity anyway is probably desireable?

bdpedigo · 2023-05-30T14:47:34Z

sklearn/mixture/_gaussian_mixture_ic.py

+
+
+
+class _CollectResults:


this is effectively a dictionary - recommend just using one, or a named tuple? i am just anti classes that only store data and dont have any methods, but that is just my style :)

bdpedigo · 2023-05-30T14:51:45Z

sklearn/mixture/_gaussian_mixture_ic.py

+        param_grid = dict(
+            covariance_type=covariance_type,
+            n_components=range(self.min_components, self.max_components + 1),
+        )
+        param_grid = list(ParameterGrid(param_grid))
+
+        seeds = random_state.randint(np.iinfo(np.int32).max, size=len(param_grid))
+
+        if parse_version(joblib.__version__) < parse_version("0.12"):
+            parallel_kwargs = {"backend": "threading"}
+        else:
+            parallel_kwargs = {"prefer": "threads"}
+
+        results = Parallel(n_jobs=self.n_jobs, verbose=self.verbose, **parallel_kwargs)(
+            delayed(self._fit_cluster)(X, gm_params, seed)
+            for gm_params, seed in zip(param_grid, seeds)
+        )
+        best_criter = [result.criterion for result in results]


why not just use GridSearchCV as in their example? https://scikit-learn.org/stable/auto_examples/mixture/plot_gmm_selection.html#sphx-glr-auto-examples-mixture-plot-gmm-selection-py

it would abstract away some of the stuff you have to do to make parallel work, for instance

github-actions · 2023-06-21T16:51:49Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 3ea60f2. Link to the linter CI: here}

…t-learn#32681)

…2685)

…cikit-learn#32689)

… n) (scikit-learn#32100) Co-authored-by: Adam Li <adam2392@gmail.com> Co-authored-by: scikit-learn-bot <tjpfdev@gmail.com> Co-authored-by: Lock file bot <noreply@github.com> Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Tim Head <betatim@gmail.com>

…andatory (scikit-learn#32664) Co-authored-by: Lucy Liu <jliu176@gmail.com> Co-authored-by: Loïc Estève <loic.esteve@ymail.com>

…-learn#32701)

…32682) Co-authored-by: Lock file bot <noreply@github.com>

…ikit-learn#32705) Co-authored-by: Loïc Estève <loic.esteve@ymail.com>

Co-authored-by: Vivaan Nanavati <vivaan@Vivaans-MacBook-Pro.local> Co-authored-by: Maren Westermann <maren.westermann@gmail.com>

…t-learn#32717)

…pace and device (scikit-learn#31829) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

…32724) Co-authored-by: Lock file bot <noreply@github.com>

…cikit-learn#32853) Co-authored-by: leweex95 <leweex95@users.noreply.github.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

…3139) Co-authored-by: Lucy Liu <jliu176@gmail.com>

…-learn#32699) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

…rn#32985)

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

Co-authored-by: Loïc Estève <loic.esteve@ymail.com>

Co-authored-by: Virgil Chan <virchan.math@gmail.com> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Loïc Estève <loic.esteve@ymail.com>

…assess the user facing value of issues/PRs (scikit-learn#33140) Co-authored-by: Tim Head <betatim@gmail.com> Co-authored-by: Anne Beyer <anne.beyer@mailbox.org>

… `cv` object (scikit-learn#33089) Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

… example (scikit-learn#33124)

…arn#33164)

…33171) Co-authored-by: Lock file bot <noreply@github.com>

…arn#33172) Co-authored-by: Lock file bot <noreply@github.com>

…lay (scikit-learn#33015) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Lucy Liu <jliu176@gmail.com>

Co-authored-by: Omar Salman <omar.salman2007@gmail.com>

…n#33169)

…it-learn#33166) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…arn#33179)

… gmIC

bdpedigo commented May 30, 2023

View reviewed changes

roychan and others added 25 commits November 9, 2025 07:27

DOC Fix a typo in plot_calibration_multiclass.py (scikit-learn#32679)

6a75463

DOC Improve load_iris docstring example (scikit-learn#32677)

4e2f1b7

DOC: Improve formatting of sklearn.mixture API documentation (sciki…

7cf4fcb

…t-learn#32681)

MAINT Cleaning up old scipy version mentions and code (scikit-learn#3…

828f882

…2685)

CI Pin pytest<9 in wheels (scikit-learn#32688)

a444d5c

DOC: Fix rendering typo in docstring for roc_curve (scikit-learn#32690)

4e4acc5

DOC: Minor Revision to SpectralBiclustering and SpectralCoclustering (s…

ab743bf

…cikit-learn#32689)

DOC move pre-commit instructions to development setup and make them m…

a987611

…andatory (scikit-learn#32664) Co-authored-by: Lucy Liu <jliu176@gmail.com> Co-authored-by: Loïc Estève <loic.esteve@ymail.com>

DOC: Update the paper URL to Neighborhood Components Analysis (scikit…

25859aa

…-learn#32701)

🔒 🤖 CI Update lock files for scipy-dev CI build(s) 🔒 🤖 (scikit-learn#…

c1309ec

…32682) Co-authored-by: Lock file bot <noreply@github.com>

DOC add doc repo cleanup to release process (scikit-learn#32620)

5159057

MNT Clean-up scipy < 1.9 code (scikit-learn#32696)

281519e

Merge branch 'main' into gmIC

19aeaeb

fix lint errors

9563704

fix linting error

0827b9f

CI Revert pytest<9 pin in wheels (scikit-learn#32698)

2fe1bfa

Return 'cpu' for device(numpy_array) when dispatch is enabled (sc…

b4238b2

…ikit-learn#32705) Co-authored-by: Loïc Estève <loic.esteve@ymail.com>

DOC Add link to plot_gmm_pdf.py in GaussianMixture (scikit-learn#31230)

5a07bfc

Co-authored-by: Vivaan Nanavati <vivaan@Vivaans-MacBook-Pro.local> Co-authored-by: Maren Westermann <maren.westermann@gmail.com>

add mahalanobis-ward init

4369cc7

fix lint and pickling errors

39f6a1a

Merge branch 'main' into gmIC

3ea60f2

DOC Add reference to valid metrics for KDTree and BallTree (sciki…

0cd2b7e

…t-learn#32717)

Add move_to function to convert array namespace and device to names…

93311ba

…pace and device (scikit-learn#31829) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

🔒 🤖 CI Update lock files for scipy-dev CI build(s) 🔒 🤖 (scikit-learn#…

c88056b

…32724) Co-authored-by: Lock file bot <noreply@github.com>

leweex95 and others added 30 commits January 27, 2026 10:00

FIX: Fixed duplicate column error appearing with FeatureUnion polars (s…

2efd6ad

…cikit-learn#32853) Co-authored-by: leweex95 <leweex95@users.noreply.github.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

DOC Add statement on supported versions for array API (scikit-learn#3…

07d060e

…3139) Co-authored-by: Lucy Liu <jliu176@gmail.com>

TST: trees: run test_min_impurity_decrease for all criteria (scikit…

3e51c6b

…-learn#32699) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

FEA: Add Array API support to pairwise_distances_argmin (scikit-lea…

686ea7c

…rn#32985)

TST: Decision trees: add test for split optimality (scikit-learn#32193)

66d314a

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

CI Port free-threaded job from Azure to GHA (scikit-learn#33116)

7fbe0c2

Co-authored-by: Loïc Estève <loic.esteve@ymail.com>

CI Run free-threaded build on schedule (scikit-learn#33147)

62b0804

ENH Add zero division handling to cohen_kappa_score (scikit-learn#31172)

be7ec61

Co-authored-by: Virgil Chan <virchan.math@gmail.com> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>

CI Migrate Linux_Nightly build to GHA (scikit-learn#33123)

e1fcf67

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Loïc Estève <loic.esteve@ymail.com>

DOC fix typos in plot_confusion_matrix.py (scikit-learn#33141)

5cbb0bd

Improve contribution guidelines to make it easier for maintainers to …

3d936d5

…assess the user facing value of issues/PRs (scikit-learn#33140) Co-authored-by: Tim Head <betatim@gmail.com> Co-authored-by: Anne Beyer <anne.beyer@mailbox.org>

ENH Turn TargetEncoder into a metadata router and route groups to…

b0bf5d7

… `cv` object (scikit-learn#33089) Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

DOC Improve clarity and reproducibility in classification probability…

a86c639

… example (scikit-learn#33124)

DOC Add note about startup overhead in parallelism (scikit-learn#33153)

b7e29df

CI Migrate pylatest_conda_forge_mkl build to GHA (scikit-learn#33155)

351280b

MNT Fix small typo in convergence warning for NewtonSolver (scikit-le…

cb7e82d

…arn#33164)

Merge branch 'main' into gmIC

180c272

ENH speedup gap safe screening (scikit-learn#33161)

0f2f3b6

🔒 🤖 CI Update lock files for scipy-dev CI build(s) 🔒 🤖 (scikit-learn#…

d05e742

…33171) Co-authored-by: Lock file bot <noreply@github.com>

🔒 🤖 CI Update lock files for free-threaded CI build(s) 🔒 🤖 (scikit-le…

52d70d3

…arn#33172) Co-authored-by: Lock file bot <noreply@github.com>

FIX predict to also use multiclass_colors in DecisionBoundaryDisp…

41f1f34

…lay (scikit-learn#33015) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Lucy Liu <jliu176@gmail.com>

FIX _predict_proba_lr in LinearClassifierMixin (scikit-learn#33168)

839d144

Co-authored-by: Omar Salman <omar.salman2007@gmail.com>

DOC: Add reference links to hinge loss API documentation (scikit-lear…

680faf4

…n#33169)

Bump pypa/cibuildwheel from 3.3.0 to 3.3.1 in the actions group (scik…

cedf994

…it-learn#33166) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

FIX Fix pip-tools error in automatic main lock-file update (scikit-le…

951e8dd

…arn#33179)

factor into example

708ac92

Merge branch 'main' into gmIC

c9a41cd

fix linter

71dfe05

Merge branch 'gmIC' of https://github.com/tingshanL/scikit-learn into…

0f78a89

… gmIC

restore doc

b418c4c

Conversation

bdpedigo commented May 30, 2023

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jun 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

github-actions bot commented Jun 21, 2023 •

edited

Loading