clustersisHAC
ClustersisHAC is a clustering approach that combines hierarchical agglomerative clustering (HAC) with a focus on cluster stability to identify robust groupings in data. The method aims to produce partitions that persist across resampling and different linkage choices, rather than relying on a single HAC run.
- Input: a data matrix, a distance or similarity measure, a set of HAC linkages (for example, single,
- Subsampling: generate multiple bootstrap or subsamples of the data.
- HAC runs: perform HAC on each subsample using the chosen linkages, producing a set of dendrograms.
- Stability assessment: compare cluster assignments across subsamples and linkages using a similarity metric such as the
- Consensus clustering: identify clusters that show high stability across resamples and linkages, and select cut levels
- Output: a robust clustering solution with stability scores for clusters or partitions, and the final cluster
- Nonparametric in the sense that it does not require a predefined number of clusters; stability guides
- Flexible with respect to distance metrics and linkage methods, enabling integration of multiple HAC configurations.
- Particularly suited to noisy or high-variance data where single HAC runs may yield unstable partitions, such
- Computationally intensive due to repeated HAC computations and stability calculations.
- Interpreting stability scores requires careful consideration of the data and chosen metrics.
See also: hierarchical clustering, cluster stability, resampling methods.