filedataset
Filedataset is a term used in data management to describe a dataset whose elements are individual files located in a file system or object store. It provides a unified view over a collection of files, enabling batch or streaming processing without loading all content into memory.
Data model: a filedataset exposes a sequence of records, each representing a file with fields such as
Backends and storage: filedatasets can be backed by local disks, network file systems, or cloud storage services
Processing patterns: filedatasets are commonly consumed by data processing frameworks for batch ETL or machine learning
Advantages and challenges: advantages include simplicity, compatibility with existing file stores, and natural fit for multi-file
See also: dataset, file system, object storage, data lake, data pipeline.