Extract-Transform-Load (ETL)

Extract-Transform-Load (ETL) are data integration practices and tools:

  • Extract

    • data-mining from different sources, different data formats, …

  • Transform

    • transformation of data to a desired form

  • Load

    • loading/storing of data to/from a target database/data warehouse

ETL Applications

  • ETL tools has many application areas today:

    1. Different sources and formats data integration (text documents, CSV, XLS spreadsheets, databases, XML data, …)

    2. Data consolidation (transformations of data and data "cleaning")

    3. Storing of data into huge databases - data warehouses for management applications

    4. Data migration (data transfers to different platforms, databases, etc.

  • ETL systems are called as "a critical building block to a successful business intelligence deployment".

Implementation

There is a lot of (not only java-based) implementations, many have a GUI, allowing to graphically design transformation flows.

Examples - Clover ETL

Commercially developed (Javlin company, FI industry partner, http://www.cloveretl.org) open-source tool containing:

  • Clover Engine

    • The kernel processing transformations. Contains connectors to external data sources and targets.

  • Clover Server

    • deployment platform for transformation execution (incl. planing and monitoring) in a real life.

  • Clover Designer

    • tool for graphical design of transformation graphs (based on Eclipse platform).

Clover ETL - Designer

Figure 1 - Clover ETL Screenshot

Clover ETL screenshot

Questions

ETL implementation and deployment on a huge data involves some problems, that are not usual in a different areas:

  • transformations should be optimized to a speed as well as to allow huge data processing,

  • effective memory models used to (temporal) store XML data - unable to use common "in memory" tree models,

  • definability, maintainability and verifiability of wide transformation networks - visual tools + formal methods.