scipy.stats.mode in genes2genes.ClusterUtils.run_agglomerative_clustering

Hi, thanks for this interesting new approach for studying single-cell trajectories. I was following the tutorial notebook at https://github.com/Teichlab/Genes2Genes/blob/main/notebooks/Tutorial.ipynb and ran into errors during the **Clustering alignments** step:

`df = ClusterUtils.run_clustering(aligner, metric='levenshtein', experiment_mode=True)
`

errors with: 

```
IndexError                                Traceback (most recent call last)
<ipython-input-141-2242a2d1f27f> in <module>
----> 1 df = ClusterUtils.run_clustering(aligner, metric='levenshtein', experiment_mode=True)

/mnt/volume/resources/miniconda3/envs/jupyter/lib/python3.9/site-packages/genes2genes/ClusterUtils.py in run_clustering(aligner, metric, DIST_THRESHOLD, experiment_mode)
    115         eval_dists = []
    116         for D_THRESH in tqdm(dist_thresholds):
--> 117             gene_clusters, cluster_ids, silhouette_score, silhouette_score_mode, n_small_cluster = run_agglomerative_clustering(E, aligner.gene_list, D_THRESH)
    118 
    119             if(len(gene_clusters.keys())==1):

/mnt/volume/resources/miniconda3/envs/jupyter/lib/python3.9/site-packages/genes2genes/ClusterUtils.py in run_agglomerative_clustering(E, gene_list, DIST_THRESHOLD, linkage)
     53     silhouette_score = sklearn.metrics.silhouette_score(X=E , labels = model.labels_, metric='precomputed')
     54     silhouette_score_samples = sklearn.metrics.silhouette_samples(X=E , labels = model.labels_, metric='precomputed')
---> 55     silhouette_score_mode = scipy.stats.mode(silhouette_score_samples)[0][0]
     56 
     57     n_clusters_less_members = []

IndexError: invalid index to scalar variable.
```

This error in scipy.stats.mode might be related to the changes introduced with scipy v1.9 (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mode.html):

> Beginning in SciPy 1.9, np.matrix inputs (not recommended for new code) are converted to np.ndarray before the calculation is performed. In this case, the output will be a scalar or np.ndarray of appropriate shape rather than a 2D np.matrix. Similarly, while masked elements of masked arrays are ignored, the output will be a scalar or np.ndarray rather than a masked array with mask=False.

This is fixed by replacing line 55 in ClusterUtils.py:

`silhouette_score_mode = scipy.stats.mode(silhouette_score_samples)[0][0]
`

with

`silhouette_score_mode = scipy.stats.mode(silhouette_score_samples)[0]`

or checking generally with something like:

```
mode_result = scipy.stats.mode(silhouette_score_samples)
if mode_result.count.size == 1:
    silhouette_score_mode = mode_result.mode[0]
else:
    silhouette_score_mode = mode_result[0][0]
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scipy.stats.mode in genes2genes.ClusterUtils.run_agglomerative_clustering #4

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

scipy.stats.mode in genes2genes.ClusterUtils.run_agglomerative_clustering #4

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions