Datensatz
Datensatz, in English dataset, is a collection of data records used for analysis. In statistics and data science, a Datensatz consists of rows (observations) and columns (variables or attributes). Each row represents one instance; each column a feature. Datasets may be structured as tables in CSV or SQL databases, semi-structured as JSON or XML, or, less commonly, unstructured but organized for processing. Metadata describes the dataset, including its name, source, collection date, units, and data quality.
In machine learning, a dataset often includes features and a target label. Datasets are partitioned into training,
Quality and preparation are important: collection, cleaning, normalization, deduplication, and annotation. Provenance information records how the
Ethics and law: privacy, anonymization, consent, and licensing govern sharing and reuse. Open datasets may be
Related concepts include data frames, matrices, and data collections. A well-documented Datensatz supports reliable analysis, transparent