categorymoments
Categorymoments is a data-analytic concept used to describe and extract distributional properties of a numerical feature within the segments defined by a categorical variable. It fits within feature engineering and exploratory data analysis and is used to capture how a variable behaves differently across categories.
Definition: Given a dataset with a numerical feature X and a categorical feature C with categories c
Computation: In offline settings, moments are computed by grouping data by C and calculating sample moments
Applications: Category-specific moments are used as features for predictive models to capture heterogeneity across categories. They
Limitations: Sparse categories can yield unstable estimates, necessitating smoothing or shrinkage. Higher-order moments can be noisy