etl
ETL stands for Extract, Transform, Load, a data integration approach used to move data from multiple source systems into a centralized repository such as a data warehouse or data lake. The extract step retrieves data from heterogeneous sources, the transform step applies cleansing, normalization, deduplication, aggregation, and business rules, and the load step writes the processed data to the target system.
Historically, ETL occurred as a batch process scheduled during off-peak hours, often using dedicated ETL tools.
Typical sources include relational databases, files, APIs, and streaming platforms. Transformations can include cleaning, type casting,
Deployment can be on-premises, in the cloud, or hybrid. ETL can be batch-oriented or near real-time when
Data quality and governance considerations include validation, lineage, metadata, auditing, and error handling, as well as
Common tool categories include commercial platforms such as Informatica, IBM DataStage, and Microsoft SQL Server Integration