Datapreprocessing
Datapreprocessing, also known as data preprocessing, is the set of techniques applied to raw data to prepare it for analysis and modeling. The aim is to improve data quality and convert data into a form suitable for algorithms while preserving meaningful information. Datapreprocessing addresses issues such as missing values, noise, inconsistencies, and redundant features, and it often determines the performance of downstream models.
Typical steps include data cleaning (handling missing values by imputation or deletion, removing duplicates and outliers),
Best practices emphasize deriving transformation parameters from the training data only and applying the same parameters