datavask
Datavask is a term used to describe a set of processes for preparing data for analysis by cleaning, standardizing, and sanitizing it. In practice, datavask encompasses activities aimed at improving data quality as well as protecting privacy by removing or obfuscating sensitive information. While often described as data cleaning or data cleansing, datavask covers a broader range of techniques and goals.
Data cleaning focuses on accuracy and consistency. Typical tasks include correcting errors, resolving duplicates, normalizing formats,
Privacy- and security-related aspects of datavask involve data masking, pseudonymization, tokenization, and encryption, as well as
Effective datavask is usually part of a broader data governance program. It relies on metadata, data lineage,
Common challenges include the trade-off between data utility and privacy, risk of re-identification, scalability, and maintaining
See also: data cleaning, data cleansing, data sanitization, data masking, data governance, differential privacy.