Data Storage and Database Design
Data Storage: type of technology specifically designed for retention of information and help with accessibility for authorized users to perform business activities effectively and efficiently.
Types of Data Storage:
Operational Data Storage: ODS is a repository of transactional data from multiple sources and is often an interim area between data source and data warehouses.Tend to be smaller and not there long.
•transactional data like customer orders, sales, vendor payments
•Transactional like system related
Data warehouse: large, centralized, data repositories used for reporting and analysis rather than for transactional purposes. Pulling data in from ODS or direct source. Then combined into a single repository.
Data Mart: more focused on a specific purpose such as marketing or logistics and is often a subset of a data warehouse. Benefits is that data is tailored to need for more effective work.
Data Lake: repository similar to a data warehouse, but it is contains both structured and unstructured data in natural or raw form.
On Premise storage: ODS, data warehouse, data mart, data lake.
Cloud Based Storage: Allows data to be securely stored in the cloud. Highly scalable. less costly and cumbersome.
(Normalized) Relational Database Design: Stores data across a series of related tables.
a method to store data. provides reasonable assurance data is complete, not redundant, business rules and internal controls are enforced, and it aids communication and integration across business processes.
Benefits:
•completeness, no redundancy, increases the risk of data-entry error (?)
•business rule enforcements: aid and enforce internal controls and business rules
•Communication and integration business processes: designed to support business processes across the organization which results in improved commmunication across functional areas and more integrated business processes
Data elements:
tables/entity.
Columns (attributes) and rows (records), each unique and relevant to purpose
field: space created at the intersection of a column and row in a table in which data is entered, info in the field is the data value
Three types of columns: primary keys (unique, not usually descriptive think numbers), foreign keys (connect), and descriptive attributes
composite primary key:combo of two attributes to uniquely identify record
flat file: files that contain plain text
Data Dictionary:
dictionary's will help with understanding the processes and the basics of how data are stored is critical, database administrations maintain database, analyst identify the data they need to use/ metadata (data about date), makes life easier for a more informed decision, wide understanding of correct information.
•examples of data dictionary: primary, foreign, data type, field required, field size etc.
Normalization: database design technique that reduce data redundancy and eliminate undesirable characteristics.
•example: make smaller tables, logical layout.
Step:
•First normal form: two criteria. One for each cell in a table must contain only one piece of information. Every record must have a primary key.
•Second normal form: All non key attribute describe primary key.
•Third normal form: ascertain that each column in a table describes only the primary key. ONLY depend on primary key.
"We want the key, the whole key, and nothing but the key"
Database Models and schemas:
Data model: conceptual representations of the data structure in an information system, not restricted to relational database only.
•conceptual (least complex, no detail, structure and meaning), logical (level of data itself, attributes, system, great way to have an discussion) , physical (mot complex, how data will be stored etc, completes model to build it out)
Data Type:
Int: Whole number values
Char: string or number values; the numbers in parenthesis is the character limit
Date/time
Decimal
Text/string
Database schemas: set of instructions to tell database how to organize to be in compliance with data models. Defines structure, how it will be stored and accessed.
Star schema: common and simple, central fact table and dimension table around it
Snowflake table: multiple related tables. Complex but flexible. Balance between normalized and star schema.
Dimensional modeling: using star or snowflake making it easier to understand and report. Doe not have normalized database, so change will need to be updated everywhere it has been duplicated.
Fact table: measures or metrics. Facts. No descriptive elements, but do have foreign keys relating to each row of data in the fact table to its corresponding dimension to provide contex.
Dimension Table: Descriptive or contextual data for measures such as data, product names
