Onehot
One-hot encoding is a method for converting categorical data into a numerical format suitable for machine learning. In one-hot encoding, each category is represented by a binary vector that has the same length as the number of categories in the feature. The vector contains exactly one '1' (the hot position) and all other elements are '0'. The position of the '1' identifies the category.
For example, for a feature with three categories — red, green, blue — red becomes [1, 0, 0],
One-hot encoded features are non-ordinal; the encoding does not imply any ordering between categories. They are
Advantages of one-hot encoding include simplicity and little risk of introducing spurious ordinal relationships. Disadvantages include
To address high cardinality, practitioners may use strategy variants such as embedding representations, target encoding, or