feat: Add HDBSCAN clustering method#90
Closed
ChenChihYuan wants to merge 3 commits intowassimj:mainfrom
Closed
Conversation
added 2 commits
March 25, 2026 23:13
Implement Cluster.HDBSCAN() - a pure numpy/scipy implementation of the Hierarchical Density-Based Spatial Clustering of Applications with Noise algorithm. Unlike DBSCAN, HDBSCAN does not require an epsilon parameter, making it more robust for datasets with clusters of varying density. Algorithm steps: - Compute core distances for each point - Build mutual reachability distance matrix - Construct minimum spanning tree (Prim's algorithm) - Build condensed cluster tree via top-down dendrogram walk - Extract stable clusters using Excess of Mass (eom) or leaf method Also adds HDBSCAN demonstration cells to the Unsupervised_Learning notebook with spiral data and gallery floor plan examples, plus a DBSCAN vs HDBSCAN comparison table.
…d add allowSingleCluster parameter - Fixed EOM bottom-up processing to use reverse sort (leaves before parents) - Added allowSingleCluster parameter (default False) matching the standard HDBSCAN library behavior: prevents trivial single-cluster results by not allowing the root cluster to dominate its children in EOM selection - Updated spiral demo note text to reflect corrected behavior - HDBSCAN now correctly finds multiple clusters on spiral and gallery data
Owner
|
Hi @ChenChihYuan. Can you please re-send this with just the changed to Cluster.py without the jupyter notebook? Thanks! |
Author
|
Yes I will do that. Sorry for bothering! DBSCAN for AEC usage looks pretty great so I thought about using HDBSCAN. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add HDBSCAN Clustering Method
Summary
This PR adds
Cluster.HDBSCAN()— a pure numpy/scipy implementation of the Hierarchical Density-Based Spatial Clustering of Applications with Noise algorithm to topologicpy'sClusterclass. It also adds demonstration and comparison cells to theUnsupervised_Learning.ipynbnotebook.Motivation
DBSCAN requires users to specify an
epsilon(neighborhood radius) parameter, which can be difficult to tune — especially for datasets with clusters of varying density. HDBSCAN eliminates this requirement by building a hierarchical density-based clustering and automatically extracting the most stable clusters. This makes it a more robust, general-purpose density-based clustering method.Implementation Details
Algorithm (
Cluster.HDBSCAN()):minSamples-th nearest neighbord_mreach(a,b) = max(core(a), core(b), d(a,b))minClusterSizepointsAPI (follows existing
Cluster.DBSCAN()conventions):Dependencies: Uses only
numpyandscipy(both already imported inCluster.py). No new dependencies.Notebook additions (12 cells in
Unsupervised_Learning.ipynb):Testing
pytesttests pass