What is the key focus on analysis?
To predict trends using quantitative data.
What are the four levels of measurement?
1. Nominal
2. Ordinal
3.
... [Show More] Interval
4. Ratio
Nominal data
is categorical. It has no numerical value. It's not a number.
Examples:
-the types of pizza you sell
-meatball
-veggie
-cheese
(these are labels, they can't be added or subtracted)
Ordinal data
is ranked, but doesn't have a specific value.
Example:
-the size of pizza (small, medium, large)
(we can't add or subtract these but we can put them in sequential order)
- order (key word)
- no numerical value
Interval data
data is numeric. You can add and subtract it. It has a sequential value. Each value is equally spaced from the previous value.
Example:
- drink sizes are interval (16 oz, 20 oz, 24 oz) they are equally spaced about (4 oz spaced)
Ratio data
is numeric. Your sales per day are ratio data.
Example"
- 10 sales of $12.99 a piece = $129.99
(the value has a true value from zero)
To challenge the validity and reliability of data, ask two things.
Are there any outliers and are there any errors?
Outliers
Don't throw them out. Do them in both ways.
Example:
last week, you were closed for two days for renovations. Sales were a zero for those two days. That's an outlier. You include the outliers to know how its effecting your bottom line.
2 categories of errors
1. Random
2. Systematic
Random error
is something that happens just once and will not repeat over time.
Example:
if you are trying to find average delivery times and one delivery was effected by a four hour Chicago traffic delay, that's random.
Systematic error
is when your deliveries (example) are slow and it is not by chance.
Example:
A delivery driver has nursed the fuel injector on his car for the past six months. It breaks down one out of every 20 deliveries he makes. This is Systematic error. It repeats itself.
Omission error
An error because something is missing.
Example:
A delivery driver didn't clock in or out for his delivery. That data will not be included in study and it's relevant.
-A data set with an omission error is defined as distorted.
out of range error
...
What is used to reduce errors?
a number of quality control tools.
Example:
a survey customers take only allows them to select responses from a list. That way, they can't type anything in wrong.
Treatment
Example: You want to make a crispier pizza and to do this, you apply three different sets of oil to three different pizza to measure the crispiness of the crust.
Blind study
when the subjects don't know if they are receiving the treatment or a placebo (a harmless procedure prescribed for the psychological benefit of the recipient)
Double blind study
An experiment in which neither the participant nor the researcher knows whether the participant has received the treatment or the placebo (a harmless procedure prescribed for the psychological benefit of the recipient)
Descriptive
...
For companies to attract and retain their best customers they need a complete portrait of who they are. To develop this portrait companies turn to...
analytics
A manufacturer wants to maximize their factory output while specifically minimizing labor costs. What type of analytics might they employ to achieve this goal?
prescriptive analytics
What type of data error that occurs in measurement is constant within a data set and is sometimes caused by faulty equipment or bias?
measurement bias
A city government is trying to determine the national origins of its recent immigrant population. If a survey of the immigrant population is conducted in English what type of error be present in the data?
Omission
The use of Big Data is increasingly important to businesses in competitive markets. Which of the following characteristics is not true of big data?
can be analyzed with traditional spreadsheets
The Davenport-Kim three-stage model consists of framing the problem, solving the problem, and communicating results. Which two of the following are part of framing the problem stage?
-determine the scope of the problem
-review of previous findings
A healthcare provider is researching blood glucose levels before and after exercising. What two elements should be part of any experimental study such as this?
treatment procedures
Runners cover 26.2 miles in the Olympics marathon. What level of measurement is this?
ratio
What level of measurement is the type of cars produced in Ford factory?
nominal
What level of measurement is this the 10 best cities in the U.S. to retire in?
Ordinal
What level of measurement are women's dress sizes (2,4,6, etc.)?
Interval
A local school board is studying the impact of a proposed change in testing on math scores. Bias can be introduced into the study by both students and teachers. Which research technique would eliminate this type of bias?
Double blind study
A Company's product development team test 3 new car waxes by waxing 5 cars with each wax and then running them through a car wash. They then record number of washes it takes before the wax begins to deteriorate. What is the term for the five cars?
The experimental unit
Random sample
samples need to be the right size and represent your population.
Example:
1. What represents the population of NBA players?
A: 50 players selected from the 2017 roster.
Response Bias
You the responder feel persuaded or that you only have one answer (one way to answer it).
Example:
Your teacher ask you to fill in a survey for teacher of the year.
-In directly, you're inclined to fill in something favoring that teacher without them asking you to.
Conscious Bias
The researcher creates the bias in question phrasing. Lawyers call it "leading."
-The "agree with me" approach.
Example:
- wouldn't you ...?
- don 't you ...?
- of course this is the obvious choice
* it's important to test a survey before giving it out.
Lack of binding
* one of the tools researcher use to remove bias.
-removes previous experiences or perceptions.
Causality
To test for causality, make sure we understand every variable that could influence the outcome. Once variable has been indentified, then we can say what the cause is.
Example:
Correlation is not "cause."
Relationship is not "cause."
Probability
a chance/ every chance. The likelihood of something happening.
Multiplication rule
* (and, both all)
The probability we use when we want to know the probability of 2 things happening at one time.
_usually, the number combined will be smaller.
Addition rule
* (or, either)
Example:
what is the probability to walk to a cash register that can either give you a change or digital receipt.
*Bayes Theorem
using a given probability to predict another probability.
Example:
Given that you (keyword; for outcome) received change for $100, what's the probability you used live(outcome value indicator) checkout?
Calculations: outcome/outcome+other probability.
Central tendency
mean, median , mode
Outliers
an observation point that is distant from other observations.
-They don't happen more than once. They don't really affect the data.
68.2%
probability of falling between -1 SD and +1 SD of the mean.
95.4%
probability of falling between -2 SD and + 2 SD of the mean.
standard deviation computed ...
take the Variance and square root the digit gives you your standard deviation.
99.7%
probability of falling between -3 SD and + 3 SD of the mean.
Median
the middle number in a set of data.
(if even number set, add the middle 2 numbers and average it out to get the median).
-outliers doesn't affect the median.
Mean
the arithmetic average of a distribution, obtained by adding the scores and then dividing by the number of scores.
-outliers affect the mean.
Variance
tells us how different each data point is different from our mean.
- take each data point - mean, then square it to get the variance for that point. Do the same for all points in the data and add the answers and divide by number of data equals Variance.
Mode
the most frequently occurring score(s) in a distribution
Range
the distance from the lowest point to the highest point.
Calculations:
lowest #/ highest # , square it, then a
what measurement reflects the middle of the dataset?
Median
What measurement reflects what occurs most in the dataset?
Mode
z-score
a measure of where you are on the curve.
z = (data in question) - mean / SD
-the closer z-score is to zero, the closer you are to the average.
-the further z-score is to zero, the further you are to the average.
- positive z-score, you're higher than the average.
- negative z-score, you're lower than the average.
normal distribution
normal means symmetrical ( 50% of data point on one side and the other on another side). Sample size of 30 is required to say its a normal distribution of data.
Which measurement can be skewed by outliers?
Variance, Mean, and Range
Which types of decisions should use measurements that exclude outliers?
taxes
You are a boss and wants to give your employees a bonus at work. Which central tendency is best to be used?
Median
There are two types of statistics (Analytics)
Descriptive and Inferential
Descriptive statistics are used to ______
Inform / Explanatory
Inferential statistics are used to ______
Predict / Trend
Name the 4 levels of measurement
(NOIR) Nominal, Ordinal, Interval Ratio
Continuous data with unique zero point
Ratio
Orders data at equal distance apart
Interval
Place qualitative objects in some kind of order
Ordinal
Identify, Group, or Categorize
Nominal
Outliers create this type of error
out of range
Unpredictable error
Random Error - No correlation
Error may occur from missing data.
(Example: Space not filled in)
Omission Error - Distorted results
This error repeats itself
Systematic Error - Skewed results
What is the process of quality control?
Reduce/ minimize errors
All variable measurements and manipulations are under the researcher's control
Experimental study
Used when impractical or impossible to control the conditions of the study
Observational study
Participants are not told if they are in the treatment group or control group
Blind Study
The procedure the researcher applies to each subject
Treatments
Neither the treatment allocator nor the participants know who is in the treatment group or control group
Double blind study
Questions favor and outcome or the interviewer ask questions that favor an outcome.
Information Bias
The average outcome (payoff) when the future includes scenarios that may or may not happen
Expected Monetary Value (EMV) Analysis
Observation points that are distant from other observations.
Outliers
Note: Can be included or excluded in analysis (causes skewness)
Bias that occurs from not selecting a random sample
Measurement bias
Bias introduced because respondents believe it will be beneficial if selected.
Conscious bias
Middle score for a set of data
Median
Note: Skewness does not affect the median
Tells us the number of standard deviations a data point is from the mean.
Z-score = (Value - Mean) / Std Deviation
If the average is the same for two groups, what will determine their difference?
Variance (Standard Deviation)
The spread of data in a sample. How far the data points are from the mean.
Standard deviation
Measure of central tendency that is influenced by the size of the values in a dataset
Mean
Note: Skewness does affect the mean
Each of the four quartile groups a population can be divided
Quartiles
Measures the difference between the third and first quartile
IQR: Inter-quartile range
Note: Must be ordered in lowest to highest value
Used to study the composition of a data set and examine the distribution
Box Plot
There are six toll booths to enter the highway. What probability does each toll booth worker have of getting the next customer?
1 customer and 6 booths = 1/6 or 16.7%
The order you pick you sample in does not matter
Combination
Picking employees for a shift. Order doesn't matter.
When given P(A) given P(B), you can use this to find the P(B) given P(A)
Bayes Theorem
You must know P(A), P(B), P(A) given B
Use this rule when looking for one or the other event happening (OR)
Addition
A technique for minimize total cost or maximize profit based on constraints
Linear programming
A technique using more than one independent variable to predict a single dependent variable
Multiple regression
...
Correlation coefficient
Measures the goodness of fit in a regression analysis
R2 (R-Square)
A simple regression using time as the independent variable
Time series
...
Trend
Unforeseen circumstances causing random deviations
Irregularity
...
Cyclicality
...
Seasonality
Represents the probability that a variable falls with a certain range
Cumulative distribution
A list of all the different probabilities of each outcome that can occur
Probability Distribution
Z-score for 99% level of confidence
2.576
Z-score for 95% level of confidence
1.960
Measures of central tendency are approximately equal (Mean and Median)
Normal Distribution
Used to compare the mean of three or more groups
ANOVA
ANOVA uses this test statistics
F-value
(must be higher than critical value to reject the null)
T-test uses this test statistic
T-value
(must be higher than critical value to reject the null)
A correlation is weak if the coefficient is close to ____
Zero
A correlation is strong if the coefficient is close to ____
1 or -1
Illustrates performance measurements over a period of time
Run Chart
Illustrates limits or constraints a process should not exceed
Control Chart
Visual tool to understand a process
Flowchart
Easy tool to collect data to create other charts
Check Sheet
Assists in brainstorming issues that are causing a problem
Cause and Effect Diagram
Not measurements!
Graphical display of a data set with one bar for each category
Histogram and Pareto
Graphical display of data set centered
Histogram
Graphical display of data set in highest to lowest order
Pareto
Used for potential relationships and correlation between variables
Scatter diagram
Can the seven tools be used independently?
Yes
What percent of quality problems does Ishikawa claim the seven tools can solve?
90% - 95%
Manufacturing approach to improving processes.
Six Sigma
...
Quality Control
Diagram demonstrating all of the elements that can influence a process before it starts.
SIPOC (Supplier - Input - Process - Output - Customer)
Plan - Do - Study - Act
Which step is a response to analytical results?
Act
Shows whether a result meets a requirement or not
Attribute
Shows how well a result meets the requirement
Variable
Variations accepted as the normal part of the process
Common cause variation
Variation from an abnormality causing large discrepancy in results
Special cause variation
Model of designing, analyzing, and scoring tests
IRT: Item Response Theory
How does the government differ than private sector cost-benefit analysis?
Government benefits aren't always money. Could be flood prevention or welfare.
Compares one individual's performance to other individuals
Norm Referenced
Compare individual's performance to a standard score (Example: Cut Score 64%)
Criterion referenced
Management strategy that uses results as the central measurement of performance
RBM: Results Based Management
What is Big Data?
Very large data sets
Used to count ALL of the existing cases in a disease.
Prevalence
Used to count only the NEW cases of a disease.
Incidence (Incident rate)
Compare individual's performance to a standard score (Example: Cut Score 64%)
Criterion referenced
Used to analyze if funding is worth the outcome of a project
Cost-benefit analysis
An online retailer selling workout apparel has a large increase in sales during December and declares that their weekly newspaper ad resulted in higher sales. What misuse of statistics may the retailer have used in making this decision?
Association and Causation
An educator collects eighth grade math scores from a local school and used this data to recommend curriculum changes for grades 8 - 12. What misuse of statistics may the educator have used in making this recommendation?
Not a representative sample
An economist wishes to study the distribution of household income in a Midwestern city. He randomly selects a sample from 12 households. He notices two large incomes in his sample. Which measure best represents the middle of the incomes?
Median
A cable company offers its customers both cable television and internet. What statistical rule should be used to determine the probability that customers will have both cable television and internet?
Multiplication
A department store is considering a new credit policy to reduce defaults on payments. Its records show that 95% of defaults have at least 2 late payments. Also, 3% of all customers default and 30% of those customers who have not defaulted have a least 2 late payments. What statistical rule should be used to find the probability that a customer will default given that at least 2 payments were late?
Bayes Theorem
Based on quality checks at plastic bags manufacturer, the breaking strength of their bags has a mean of 50.5 and a standard deviation of 1.6. A customer's test of the bags finds bag strength of 54.2 or less. Which statistical measure should be used to help determine the probability of the customer's test occurring?
z-score
The revenue of NBA teams ranges from $226 million for the New York Knicks to $92 million for the Milwaukee Bucks. Which statistic would measure how far each team's revenue is from the NBA revenue mean?
Variance
A local plumbing company is analyzing to see how the number weeks taken for their payment receipts is distributed. Which graphical analysis technique should they use?
0/1
Histogram
Given the following data set:60, 41, 30, 15, 34, 30 What is the Median?
0/1
32 [Show Less]