undersplitting - Infinite Lexicon - Infinite Lexicon

undersplitting

Undersplitting is a term used in several disciplines to describe a situation in which a process that partitions a domain into smaller units applies too few boundaries, resulting in overly coarse granularity. It is commonly contrasted with over-splitting. In statistics, machine learning, and data analysis, undersplitting occurs when a model or algorithm creates too few partitions (such as clusters, segments, or decision regions), leading to underfitting and failing to capture structure or variability in the data. Causes include strong regularization, small sample sizes, or a bias toward simplicity. Consequences include biased estimates, higher bias, and poorer predictive performance on diverse inputs.

In linguistics and natural language processing, undersplitting can refer to under-segmentation, where text is not divided

Detection and remediation typically involve examining model or analysis granularity, residual patterns, and cross-validation performance to

See also: over-splitting, underfitting, segmentation, granularity.