ETL = Extract – Transform – Load
During extraction, the desired data is identified and extracted from many different sources, including database systems and applications. Very often, it is not possible to identify the specific subset of interest, therefore more data than necessary has to be extracted, so the identification of the relevant data will be done at a later point in time. Depending on the source system’s capabilities (for example, operating system resources), some transformations may take place during this extraction process. The size of the extracted data varies from hundreds of kilobytes up to gigabytes, depending on the source system and the business situation. The same is true for the time delta between two (logically) identical extractions: the time span may vary between days/hours and minutes to near real-time. Web server log files, for example, can easily grow to hundreds of megabytes in a very short period of time.
After data is extracted, it has to be physically transported to the target system or to an intermediate system for further processing. Depending on the chosen way of transportation, some transformations can be done during this process, too.
ETL is a bridge for bi-directional flow.
It can work in either direction. It does not necessarily extract data from source only and load into destination only after transform. The vice versa is true as well.
Process of transferring data between storage types or formats. An automated migration frees up human resources from tedious tasks. Design, extraction, cleansing, load and verification are done for moderate to high complexity jobs.
Usually associated with moving data from remote locations to a central location or combining data due to an acquisition or merger
Process of combining data residing at different sources and providing a unified view. Emerges in both commercial and scientific fields and is focus of extensive theoretical work. Also referred to as Enterprise Information Integration.