Extract-Transform-Load (ETL)
Extract-Transform-Load (ETL) are data integration practices and tools:
-
Extract
-
data-mining from different sources, different data formats, …
-
-
Transform
-
transformation of data to a desired form
-
-
Load
-
loading/storing of data to/from a target database/data warehouse
-
ETL Applications
-
ETL tools has many application areas today:
-
Different sources and formats data integration (text documents, CSV, XLS spreadsheets, databases, XML data, …)
-
Data consolidation (transformations of data and data "cleaning")
-
Storing of data into huge databases - data warehouses for management applications
-
Data migration (data transfers to different platforms, databases, etc.
-
-
ETL systems are called as "a critical building block to a successful business intelligence deployment".
Implementation
There is a lot of (not only java-based) implementations, many have a GUI, allowing to graphically design transformation flows.
-
Clover ETL
-
http://www.cloveretl.org - open source ETL tool including GUI
-
-
Microsoft SQL Server Integration Services
-
Octopus Java/XML ETL Tool
-
java-etl
-
Kettle
Examples - Clover ETL
Commercially developed (Javlin company, FI industry partner, http://www.cloveretl.org) open-source tool containing:
-
Clover Engine
-
The kernel processing transformations. Contains connectors to external data sources and targets.
-
-
Clover Server
-
deployment platform for transformation execution (incl. planing and monitoring) in a real life.
-
-
Clover Designer
-
tool for graphical design of transformation graphs (based on Eclipse platform).
-
Clover ETL - Designer
Figure 1 - Clover ETL Screenshot
Questions
ETL implementation and deployment on a huge data involves some problems, that are not usual in a different areas:
-
transformations should be optimized to a speed as well as to allow huge data processing,
-
effective memory models used to (temporal) store XML data - unable to use common "in memory" tree models,
-
definability, maintainability and verifiability of wide transformation networks - visual tools + formal methods.