datgen
Datgen is a term used for software tools and libraries designed to generate synthetic or anonymized data for testing, development, benchmarking, and research. The goal is to produce datasets that resemble real data in structure and distribution while avoiding exposure of sensitive information. The name is used by several independent projects across programming languages, each with its own implementation details.
Core capabilities typically include a schema-driven generator that accepts a data model describing entities, fields, data
Operation typically follows a workflow in which a user defines a data model in a schema file,
Limitations include the challenge of achieving high realism, potential bias in generated patterns, and performance considerations