Data Extraction, Intergration, and Process Documentation

Structured Query Language: computer language to interact with data (table, records, and attributes) in a relational database.

Allows creation, retrieval in a useful way, updating, deleting, and viewing and extraction of data. But you have to understand the data (so understanding data dictionary and models is important)

SQL queries: asking the database a question and receiving an answer on the criteria placed in the SQL query.

Written to indicate which subset of data is intended for extraction and is made up of SQL commands and database elements, which make up SQL Clauses.

SQL Commands: language specific words, such as select, from, join, group by, having, where and order by. Case doe not matter, but upper case is common since it differentiates them from database elements.

Database elements: references to table names, attribute name, or criteria. Must be spelled exactly the same but case does not matter, but usually the first letter is capitalized.

SQL Clauses: phrases that begin with SQL commands and include database elements such as attribute or table names. Usually begins with SELECT and FROM.

•SELECT: indicates which attribute the user wishes to view. Uses QL string function to adjust string value to combine or remove characters from a set of attributes. Can change order which changes. How it is extracted

(Select= select all).

SUM (attribute)

COUNT (attribute)

AVE (attribute)

SELECT COUNT (sales order id) FROM sales orders

•FROM: lets the database management system know from which table to pull the data from.

•WHERE: behaves like a filter in excel. The syntax is (WHERE attribute name = criteria)

Attribute categories: text should be in quotes, number does not need quotes, or date need to be formatted correctly.

•GROUP BY: preferred group the aggregates by

Steps:

•Determine which fields to aggregate

•Determine which field is preferred to group the aggregate by

•Place the descriptive attribute and the aggregate function in SELECT

•Indicate which table contains the attributes and place that table in FROM

•Add a third clause, Group BY, which will also contain the descriptive attribute that is preferred to group the aggregate by.

•HAVING: result can be filtered for aggregate measures, similar to WHERE clause, but filters the item further

HAVING aggregate (attribute) = Criteria

the aggregate can be any aggregate value like sum, ave, count.

attribute is the field being aggregated.

can be replaced with ,, =,

•JOINs and ON:

INNER JOIN: order of table is irrelevant retrieve data from two tables and will retrieve only the data for which there is a match in both tables.

INNER JOIN table 2 ON table1.matching_key = table 2.matching_key

LEFT JOIN: provide data for which there is not a match. Order matters. Pulls from table 1.

LEFT JOIN table 2 ON table1.matching_key = table2.matching_key

Data Integration and Visualizing the Flow of processes and data

Data Integration: combining two data's.

Operational analysis: relevant data to be stored across different database or spreadsheet that needs to be integrated to make operational decisions or assess performance.

VIsualization:

•Flowchart:

Business Process Modeling Notation: standardized tool. Enables common set of principles to communicate business processes so that they can. Be documented, improved, and managed. Improves effectiveness and. efficiency of processes and examin to identify process areas that can be automated. The more rule bound, the easier to automate. Read left to right. Most common rule:

Flow: to describe which organizations are involved in a process and within those organization to describe how the process is separated across roles/duties. (Has to have two pool for separate selling and customer, then swimlanes which segregate duties)

Events: describe how a process begins or ends. Not actions. They are circle where the thickness of the circle depends on whether the event is a start, intermediate, or an end event.

only one start event regardless of swim lanes

end event is a bold circle. Usually only one unless indicate the potential that a process is cut short early or has multiple ways to end

intermediate events: not required element but show changes the course of a process, such as a time delay or an error. Two circle lines.

Connecting objects: sequence objects in the pool and message flow connecting two pools, dotted line showing how info is sent between pools.

Gateway helps provide analysis opportunities for when a task should a task shouldn't result n only one possible sequence flow. It is a diamond. question.

Data FLow Diagram:

process: any actions that change a data, formatted as a short sentence and is either a rectangle with rounded corners or a circle.

data flow: direction it is flowing, arrows.

data storage: open ended rectangle

external entry/terminated: end square.