datasjø
A datasjø is a centralized data repository that stores large volumes of data in its native format from diverse sources. The term is widely used in Norwegian IT to denote what is commonly called a data lake. Datasjøer are designed to support analytics, data science, and machine learning by providing scalable, flexible access to raw data.
Key characteristics include storing data in its raw form, schema-on-read rather than upfront schema, support for
Architecturally, a datasjø combines an ingestion layer for batch and streaming data, a storage layer built
Common data sources include logs, clickstream data, sensor data, transactional records, images, and scientific measurements. Ingest
Benefits of a datasjø include centralized data management, flexibility in data types, accelerated analytics, and the
Challenges include data quality management, governance and compliance, data discoverability, cost control, data versioning, and performance
For further reading, see Data lake, Data lakehouse, Metadata management. This article uses the Norwegian term