The current acronym of ETL (Extract Transform and Load) is focused on gathering all needed data from multiple sources. The past effort moved a single sourced file into multiple tables with the advent of RDMS. The purpose was to allow for incremental build and updates without the overhead of complete reloads. The pendulum seems to have swung back to a need for a single sourced master file. The main efforts in this chapter are to transform or create new and more meaningful variables from the raw information provided. Hierarchies within information must be attended to avoid spurious joins.
This site introduces the reader to streaming or recurrent data in several different settings. The ability to sub-aggregate data by time allows the investigator to add distributional information into the equation. These additions add value that is not readily apparent in the raw format state. The reader is exposed to interesting opportunities to reduce the error term in modeling by expanding the informational set to include these unique transformations.
|
|