lowcardinality

Low cardinality is a term used in statistics and data management to describe a categorical variable that has a relatively small number of distinct values compared with the size of the dataset. Cardinality is the count of unique values in a column. A low-cardinality feature may have a handful of categories such as gender, payment method, or regional codes, whereas high cardinality examples include user identifiers or precise timestamps.

In machine learning and data analysis, low-cardinality features are generally easier to encode and train with.

From a database perspective, low cardinality columns tend to be less selective, which can reduce index effectiveness

Practical handling often involves grouping rare categories into an “Other” bucket, creating derived features from the

See also: high cardinality, categorical encoding, feature engineering.

a

a

higher-cardinality

characterization