ELT (Extract - Load - Transform)
ELT (Extract - Load - Transform) is a data processing approach where data is extracted from source systems, loaded into a destination (typically a data warehouse or lake), and then transformed in place.
Core Philosophy
A core tenet of ELT philosophy is that data should be untouched as it moves through the Extract and Load stages so that the raw data is always accessible. If an unmodified version of the data exists in the destination, it can be transformed without needing to sync data again.
Key Benefits
- Preserves raw data: Original data remains available for reprocessing
- Flexibility: Transformations can be changed without re-extracting
- Efficiency: Avoids redundant data movement
- Auditability: Raw data serves as a source of truth
- Iterative development: Transformations can be refined without re-syncing
Comparison to ETL
Unlike traditional ETL (Extract - Transform - Load), where transformation happens before loading:
- ELT loads raw data first, then transforms
- ELT leverages the processing power of modern data warehouses
- ELT allows multiple transformation pipelines on the same raw data
Use Cases
- Modern data warehouses (Snowflake, BigQuery, Redshift)
- Data lakes and lakehouses
- Real-time analytics pipelines
- Multi-consumer data architectures
Tools
Extract-Load (EL)
- Airbyte: Open-source data integration platform
- Meltano: Open-source ELT platform
Transform (T)
- dbt: Data transformation tool that runs in your warehouse
Orchestrator
- Apache Airflow: Workflow orchestration platform
- Kestra: Open-source workflow orchestration platform (kestra.io)
- Prefect: Modern workflow orchestration platform
Visualize
- Metabase: Open-source business intelligence tool
- Superset: Apache Superset, open-source data visualization platform
- Grafana: Analytics and monitoring platform
- Redash: Open-source SQL query and visualization tool
- Looker: Business intelligence and data analytics platform
Data QA
- Great Expectations: Data quality and validation framework (greatexpectations.io)
Related Concepts
- [[Operational Data]] — source data for extraction
- [[Analytical Data]] — transformed data for analysis
- [[Analytically Operational Data]] — data that bridges both worlds
- [[Architecture Decision (AD)]]
Definition based on data engineering practice.
Linked References
- [[Analytical Data]]
Data that helps humans make better decisions.
- [[Analytically Operational Data]]
Data that automatically helps someone make better decisions.
- [[Operational Data]]
Data whose purpose is to remember things.