Data Life Cycle

Data life cycle: the sequential steps all business data must go through from creation, uses, storage, and final disposal

1Define

Defining what data a business needs and where to capture or retrieve such data.

Determines what data a business needs and where such data would be retrieved from helps enhance the likelihood that selected data is relevant to the goals of data collection for the business.

2Capture/Creation

Obtain the data, either by creating data internally or capturing data from where it has been externally.

Internal data is a type of digital asset that is created by the company manually or automatically or semi-automatic.

External source should consider integrity, safety, and copyrights of the data. Might need to sign contract.

3Preparation

Determine whether the data is complete, clean, current, encrypted, and user friendly.

Enhancing completeness and integrity of data: any time a data is moved location to another, it is possible that some of the required data could have been lost during the capture process. Can be done through 4 steps:

•Compare number of records expected v. actual.

•Compare descriptive statistics for numeric fields if you are privy to checksum from the original data source. Comparing those statistics helps to check for potential missing data or incorrectly formatted fields.

•Validate fields formats are consistent with the source to ensure that the formatting transfers appropriately.

•Compare character limits for the attributes in source file to new source file.

Data Integration: when data is sourced externally it is important to design the data architecture to integrate and be updated/mirrored properly.

Quality is important. Cleaning data:

•removing unnecessary headers

•clean leading zeros and non printable characters

•format negative numbers to ensure consistency identify and correct inconsistencies across data in general

•address inconsistent data type

Data Encryption: for selective data storage and moving

4Synthesis

Bridge between preparation and usage. Not a necessary step, but might be a step needed to add on to data you already have so you can use it for your own purposes.

5Analytics and usage

The data is ready for practical use in the organization to create reports and inform decisions. As long as data remains useful, this stage will last. It focuses only on internal.

6Publication

Where data prepared for internal users may also be shared with external users. Be careful.

7Archival

Following the decline in need, data sets are moved from an active system to a passive system.

Frees up storage resources, enhances active system performance, and reduces security risk.

Archived data will be tested for accuracy and completeness before and after.

8Purging

Data is useless. There is no other requirement that makes us maintain it. Make sure it is completely purged.

Types of Data Collection:

Extract, transform, and load: Data already exists, is extracted from its original source, transformed into useful information, and loaded into the tool for analysis.

Steps: capture, preparation, and synthesis but ETL is more specific method for collecting existing data in order to answer a specific data analysis question

Active data collection: New data.

Passive data collection: information gathering without direct permission from their users through tracking web usage via cookies or gathering time stamps of when users interact with website