Clustering - Infinite Lexicon - Infinite Lexicon

Clustering

Clustering is a task in unsupervised learning and statistics that groups objects so that those within a cluster are more similar to each other than to objects in other clusters. The goal is to discover structure, patterns, or subgroups without predefined labels.

Clustering relies on similarity or distance and supports numerical, categorical, or mixed data. Algorithms fall into

Partitioning methods assign objects to a predefined number of clusters by optimizing cohesion. Hierarchical clustering builds

Model-based clustering assumes data arise from a mixture of distributions; parameters are inferred by methods such

Distance measures and scaling influence results. Common metrics include Euclidean, Manhattan, cosine, and Jaccard for binary

Evaluation uses internal indices, such as the silhouette coefficient and Davies-Bouldin index, or external indices like

Applications include market segmentation, image and text clustering, bioinformatics, anomaly detection, and social network analysis.

Limitations include choosing the number of clusters, sensitivity to outliers and scaling, difficulties in high-dimensional data,

a

Expectation-Maximization.

a