In a standard 52-card deck, where each card is assigned a numeric value, the mean of the 52 cards is 7.0 with a standard deviation of 3.742. To test if a
... [Show More] representative sample can be attained, the deck is shuffled and 13 cards are randomly drawn, one at a time. Each card is recorded and then returned randomly to the deck before the next card is drawn. This process of drawing 13 random cards is repeated a total of 20 times, and the mean and standard deviation of each round is computed.
Which type of sampling is this an example of?
Randomization
Multiple comparison
Matched pairs
Bootstrap - correct answer Bootstrap
A study is done to determine if the color of an apple affects the purchase rate by consumers at a grocery store. Purchase counts for three colors of apples are collected.
Which technique should be used to determine if there is a difference in purchase rate by color?
Rank-based
Randomization
t-test
ANOVA - correct answer ANOVA
A hospital network collects the amount of time it takes doctors to see patients.
Which two techniques are appropriate to compare the mean times of two subsets within this set of doctors?
Choose 2 answers
ANOVA
t-test
Logistic regression
Principal component analysis - correct answer ANOVA
t-test
A data analyst needs to determine whether antioxidants in blackberries can reduce age-related cognitive decline. Twenty-five adults between 65 and 80 are selected to eat blackberries for six months and be retested. The original scores are compared with the second scores to determine whether there was decline.
What is the appropriate model to test the hypothesis?
z test
ANOVA
Linear regression
Two-sample t-test - correct answer Two-sample t-test
Which assumption does a data analyst have to make when performing ANOVA on a dataset?
There are just two parameters that are being estimated.
The error terms follow a normal probability distribution.
The parameters are additive.
The explanatory variable is continuous. - correct answer The error terms follow a normal probability distribution.
Which two assumptions are made when the linear regression model is used?
Choose 2 answers
Critical value is based on Fisher's exact test.
Error terms are independent and identically distributed.
Variances are not equal.
Parameters β 0, β 1, and σ 2 are constant. - correct answer Error terms are independent and identically distributed.
Parameters β 0, β 1, and σ 2 are constant.
Which assumption is required to perform a two-sample t-test?
Equal variance
Large sample size
Small sum of residuals
Equal mean - correct answer Equal variance
Given the following statistical model:
y = 42 - 4.6 b
Match each element of the model with its corresponding term.
Answer options may be used more than once or not at all. Select your answers from the pull-down list.
y Dependent variable Dependent variable
42 Intercept Intercept
-4.6 Slope Slope
b Independent variable Independent variable - correct answer y Dependent variable Dependent variable
42 Intercept Intercept
-4.6 Slope Slope
b Independent variable Independent variable
Given the following generalized two-sample t-test equation:
yi ,j = µi + εi ,j
Classify each element in the equation.
Answer options may be used more than once or not at all.
Select your answers from the pull-down list. - correct answer yi ,j Observed value Observed value
µi Mean response Mean response
εi ,j Error term Error term
Given the following linear model:
y i = β 0 + β 1 xi + ε i
Classify each element correctly.
Answer options may be used more than once or not at all.
Select your answer from the pull-down list. - correct answer y i Response variable Response variable
β 0 + β 1 xi Mean response Mean response
xi Explanatory variable Explanatory variable
A study comparing graduating cumulative GPAs from students with two different majors is conducted to study the effect of each major on academic achievement. From the list of all graduating students in both majors, a random sample of 30 students from Major One and 30 students from Major Two are selected. The researcher decides to use an ANOVA test for this study.
What will the researcher need to calculate?
Sample size and sample mean
Group means and standard deviations
Population mean and standard deviation
Percentage of the population sampled and population mean - correct answer Group means and standard deviations
A data analyst is studying the relationship between gender and marriage using the result below:
T = test: Paired two sample for means
Married Gender
Mean 0.48 0.64
Variance 0.26 0.24
Observations 25 25
Pearson correlation 0.220176
Hypothesized mean difference 0
df 24
T stat -1.28103
Confidence level 0.95
P (T<=t) one-tail 0.106213
t critical one-tail 1.710882
P (T<=t) two-tail 0.212425
t critical two-tail 2.063899
Given a confidence level of 95%, what can be concluded?
There is no significant difference of mean between these two groups given the t-values.
There is a significant difference of mean between these two group as our null hypothesis is µ 1 - µ 2 = 0.
There is a significant difference of mean between these two groups as Pearson correlation is positive.
There is a significant difference of mean between these two group as our alternative hypothesis is µ 1 > µ 2. - correct answer There is no significant difference of mean between these two groups given the t-values.
A junior data analyst intern is using linear regression to predict climate change. Given the following model:
GAT ~ 0 + 0.2 OR + 0.1 GOT + 0.4 EAPG + 0.05 (EAPG)(OR)
Given the following values:
• OR = 10
• GOT = 5
• EAPG = 4
What is the correct result?
0.05
0.61
1.0
6.1 - correct answer 6.1
From hospital records, the following values for the named components are obtained:
Treatment Control
Average birth weight of infant(g) 3100 2750
SD 420 425
N 75 75
Birth weight is assumed to be normally distributed. The average birth weights in the two groups are tested using an independent two group t-test. H 0 is that there is no difference in the birth weight between the treatment and control groups with α = 0.05.
Which formula can be used to solve the problem and show that the birth weight is significantly different?
t = =5.07
t = = 1.39
t = = 43.93
t = = 12.04 - correct answer t = =5.07
A data analyst is doing an analysis using t-tests and gets the following output:
t -stat: 2.8
p-value: .006
Critical value: 1.2
Which conclusion can be drawn?
Accept the null hypothesis.
The critical value is too high.
Reject the null hypothesis.
The two population means are similar. - correct answer Reject the null hypothesis
A data analyst is examining the relationship with student drivers between driving skill and texting. A driving circuit is set up and a group of students are each tested twice. The first test is performed while sending and receiving texts and the second test is recorded without texting. After each of the tests, the number of cones hit by student drivers is recorded.
What are two appropriate conclusions that should be made?
Choose 2 answers
Dependent variable is continuous.
Independent variable is texting condition.
The distribution is skewed.
A t-test should be used. - correct answer Independent variable is texting condition.
A t-test should be used.
A shoe company wants to determine the effect of color and design on price for its new line of shoes.
Classify each of this study's variables as independent or dependent.
Answer options may be used more than once or not at all. Select your answers from the pull-down list. - correct answer Color Independent variable Independent variable
Design Independent variable Independent variable
Price Dependent variable Dependent variable
An insurance company tries to identify the reason for the maximum claim in a particular segment of a vehicle, and it identifies parameters as given in the following equation:
Claims segment ~ α + β 1 tire type + β 2 vehicle size + β 3 make year + β 4 mileage + β 5 vehicle weight + ε
In this equation, what is the vehicle size?
The response variable denoted by y; will work as constant
The confounding variable denoted by y; will determine the claim segment
The coefficient value denoted by β; will be nonsignificant
An explanatory variable that helps determine claim segment. - correct answer An explanatory variable that helps determine claim segment.
A company wants to hire employees who are most likely to be successful at selling the company's products. The data analyst needs to determine if intelligence and friendliness predict sales performance of current employees. The current employees are given tests to measure both factors, and their sales performance data is checked, giving the company three scores for each employee.
What is the criterion variable for this study?
YOUR
ANSWER CORRECT
ANSWER
Friendliness
Intelligence
Strategy results
Sales performance - correct answer Sales performance
A researcher builds a multiple regression model. While testing the model, the researcher tests the model assumptions.
Which assumption violates the standard model assumptions?
YOUR
ANSWER CORRECT
ANSWER
The variance of the model is constant.
The model terms are additive.
The error terms are dependent.
The model parameters are constant. - correct answer The error terms are dependent.
Given the following regressions equation:
Height = 96 + 3.4(Age) - 42(Weight)
When adding an interaction term between Age and Weight, what will be the outcome?
YOUR
ANSWER CORRECT
ANSWER
The intercept value will be dependent on Age and Weight.
The unique effect of Age on Height will not be possible to determine.
The model will no longer be considered a linear regression equation.
The effect of Age on Height will depend on Weight. - correct answer The effect of Age on Height will depend on Weight.
Given the following simple linear regression model:
y i = β 0 + β 1x 1,i + β 2x 2,i + β 3x 1,I * x 2,i
What is the definition of x 1,I * x 2,i?
YOUR
ANSWER CORRECT
ANSWER
Response term
Interaction term
Independent term
Dependent term - correct answer Interaction term
Given the following equation:
R 2 ad j = 1 - ( n - 1 / n - p)(1 - R 2)
What does n represent?
YOUR
ANSWER CORRECT
ANSWER
Sample size
Degrees of freedom
Number of coefficients
Population size - correct answer Sample size
What is the correct formula for AIC?
YOUR
ANSWER CORRECT
ANSWER
n[log( σ ̂ 2)] + 2 p
n[log( σ ̂ 2)] + p[log( n)]
n[log( σ ̂ 2)] + p[log( p)]
n[log( σ ̂)] + 2 p - correct answer n[log( σ ̂ 2)] + p[log( n)]
A clinical trial designed to evaluate the efficacy of a new drug to increase HDL cholesterol is conducted. One hundred patients enroll in the study and are randomized to receive either a new drug or a placebo. A multiple regression analysis is used to assess effect modification. Let T = Treatment (1 = new drug, 0 = placebo), M = male gender (1 = Yes, 0 = No), and TM = the interaction of treatment and male gender.
Given the following results:
Independent variable Regression coefficient T p-value
Intercept 39.24 65.89 0.0001
T(Treatment) -0.36 -0.43 0.6711
M(Male) -0.18 -0.13 0.8991
TM(Treatment X Male) 6.55 3.37 0.0011
What is the linear regression model?
T = 65.89 - 0.13(M) + 3.37(TM)
HDL = 39.24 - 0.36(T) - 0.18(M) + 6.55(TM)
T = 65.89 - 0.43(T) - 0.13(M) + 3.37(TM)
HDL = 65.89 - 0.43(T) - 0.13(M) + 3.37(TM)
HDL = -0.36(T) - 0.18(M) - correct answer HDL = 39.24 - 0.36(T) - 0.18(M) + 6.55(TM)
Using the following models:
Model 1: Score = 2035 - 0.277 (Attendance) + 1299 (Extra Curriculum)
Model 2: Score = 7291 - 0.098 (Attendance) + 2325 (Extra Curriculum)
Model 3: Score = 3745 - 0.179 (Attendance) + 3200 (Extra Curriculum) + 3122 (Recommendation)
Assuming other values are constant in all three models, what will changing one unit in Extra Curriculum result in?
YOUR
ANSWER CORRECT
ANSWER
Changes in score by 3200 in Model 2
Changes in score by 3200 in Model 3
Changes in score by 3200 and 1299 in Models 1 and 3, respectively
Changes in score by 3745 in Model 3 - correct answer Changes in score by 3200 in Model 3
iven the following model, which is being used to predict the number of calories a cyclist will burn during a workout:
Calories = 188 + 4(Distance) - .05(Age) + .4(Age * Distance)
How many calories is a 30-year-old cyclist predicted to burn after 10 miles?
YOUR
ANSWER CORRECT
ANSWER
344.5
346.5
364.5
366.5 - correct answer 346.5
A scientist is studying which factors will affect tree height. Data from 100 trees is collected. The independent variables are the trees' location, ambient temperature, and soil type. Given the following multiple linear regression model, which is being used to analyze the results:
Tree height ~ 3.52 tree location + 5.23 ambient temperature + 2.15 soil type
The scientist tests the model's validity using the F-test and calculates the R 2 value. The R 2 value for this model is 0.93, F 2, 97 is 8.02, and F 97, 2 is 99.3.
Which conclusion can be made?
YOUR
ANSWER CORRECT
ANSWER
The p-value is equal to 0.07, indicating a bad fit for the model.
F 2, 97 is less than F 97, 2, indicating a good fit for the model.
R 2 is less than 1, indicating a bad fit for the model.
R 2 is approaching 1, indicating a good fit for the model. - correct answer R 2 is approaching 1, indicating a good fit for the model.
How is the Residual Sum of Squares (SSE) defined? - correct answer Sum of squares [n, t=1] (yi-y^)2
A researcher is studying the effect of two medicines that treat insomnia. The medicine is tested on 40 people with insomnia.
Given the results in the following table:
Medication Cured Not Cured
Medication 1 18 7
Medication 2 11 4
The researcher wants to compare the effect of two medicines and get a comparably accurate estimation for a p-value for this test.
Which method should the researcher use?
YOUR
ANSWER CORRECT
ANSWER
Two-sided hypothesis test
Fisher's exact test
ANOVA
Odds ratio - correct answer Fisher's exact test
A data analyst working on a remote island has no available computing technology, but needs to compute a p-value for three categories of data about local flora and fauna.
Which technique is appropriate for this scenario?
YOUR
ANSWER CORRECT
ANSWER
Chi-square test
Student's t-test
ANOVA
Fisher's exact test - correct answer Chi-square test
A researcher is studying whether presence or absence of a particular gene in mice is tied to fur color (white or brown). A total of 1000 mice were included in this study. There is presently no scientific evidence known to tie the presence or absence of this gene to either white or brown fur in mice.
Which technique should be used to evaluate the results of this study?
YOUR
ANSWER CORRECT
ANSWER
Simulation study
Fisher's exact test
Two-sided hypothesis test
Chi-square test - correct answer Two-sided hypothesis test
Given the following dataset:
Did Not Heal Did Heal Total
Bandage N % N % N %
Elastic 23 82% 2 13% 25 58%
Inelastic5 18% 13 87% 18 42%
Total 28 65% 15 35% 43 100%
Why is the Fisher's exact test more appropriate to use than the chi-square test?
YOUR
ANSWER CORRECT
ANSWER
One of the cell values misrepresents the population.
One of the cell values represents a different distribution than the others.
One of the cell values overestimates the sample.
One of the cell values is too small. - correct answer One of the cell values is too small.
What is the major assumption needed to perform Fisher's exact test?
YOUR
ANSWER CORRECT
ANSWER
The observations follow a hypergeometric distribution.
The proportion between the two groups should be larger than 5%.
The number for each cell should be smaller than 5.
The row and column counts must be known. - correct answer The row and column counts must be known
An analyst is tasked with testing a case with a proportion of success that is close to 0%. A chi-square test is used.
What is the assumption for this method?
YOUR
ANSWER CORRECT
ANSWER
Contingency tables are limited to 2×2 sizing
Expected count is calculated based on the distribution following the alternative hypothesis
Separate samples for each proportion should be assumed
Large sample to allow for expected values greater than 5 - correct answer Large sample to allow for expected values greater than 5
Given the following dataset:
Red Green
Male 4 8 12
Female 9 6 15
13 14 27
What is the p-value for this data using Fisher's exact test?
YOUR
ANSWER CORRECT
ANSWER
0.05
0.25
0.75
1.0 - correct answer 0.25
Given the following study results:
Red Green
Male 4 8 12
Female 9 6 15
13 14 27
What is the chi-square statistic for the 95% confidence interval?
YOUR
ANSWER CORRECT
ANSWER
0.17
0.58
1.90
3.90 - correct answer 1.90
A data analyst is testing the effects of two types of medication.
Given the following results:
Medication 1 Medication 2
Cured 58 39
Uncured 22 57
The analyst performs the chi-square test. What is the expected count, rounded to the nearest whole number, of uncured that were given medication 1?
YOUR
ANSWER CORRECT
ANSWER
26
36
44
53 - correct answer 36
Given the following chart:
H0 : µ= µ0 OR H0 : µ= µ0 OR H0 : µ= µ0
Ha : µ≠ µ0 OR Ha : µ> µ0 OR Ha : µ< µ0S
What is the researcher using this chart for?
YOUR
ANSWER CORRECT
ANSWER
Two-sided t-test
Fischer's exact test
Wald's confidence interval
Chi-square test - correct answer Two-sided t-test
A researcher uses a chi-square test to find χ 2 = 6.78.
Which conclusion can be drawn from this?
YOUR
ANSWER CORRECT
ANSWER
Due to the small value of χ 2, observed and expected counts are the same.
The χ 2 value does not allow conclusions, and a p-value must be calculated.
The χ 2 value is greater than 3.84, thus the null hypothesis should be rejected.
Due to the small value of χ 2, observed and expected counts are different. - correct answer The χ 2 value does not allow conclusions, and a p-value must be calculated.
A data analyst studying dieting habits selects 24 athletes and divides them into groups by age (over 35, and 35 or under) and by whether they are dieting (dieting, or not dieting). Ten of the athletes are dieters, and 12 of the athletes are over 35.
The hypothesis is: Athletes in both age groups are equally likely to diet.
Given the following tables:
Data Table 35 and Under Over 35 Totals
Dieting 1 9 10
Not Dieting 11 3 14
Column Totals 12 12 24
Expected 35andUnder Over35 Totals
Dieting 0 10 10
Not Dieting 12 2 14
Column Totals 12 12 24
What can be concluded from the study?
YOUR
ANSWER CORRECT
ANSWER
Reject the null hypothesis
Variables are not fixed
Distribution is normal
p-values are approximate - correct answer Reject the null hypothesis
An analyst was asked to analyze the test case of effectiveness of reducing a side effect, comparing the treatment versus placebo group on the result of having a side effect or not.
What is the proper method to use in this analysis?
YOUR
ANSWER CORRECT
ANSWER
Polynomial regression
Logistic regression
PCA
Linear regression - correct answer Logistic regression
During the interaction assessment stage using the following dataset:
Variable Abbreviation Coding
Type of Hospital HT 1 = public, 0 = private
Size of Hospital HS 1 = large, 0 = small
Degree of contamination CT 1 = contaminated, 0 = clean
Income INC Continuous
Gender GN 1 = female, 0 = male
Drinker DRK 1 = Yes, 0 = No
Factors p-value
HT x CT 0.011
HT x GN 0.025
HT x INC 0.057
HT x HS 0.124
HT x DRK 0.092
CT 0.097
Using the hierarchy principle, which two variables should be retained for all further models considered?
Choose 2 answers
GN
INC
HS
CT
DRK - correct answer GN
CT
Given the following table that shows the summary of admission data:
Summary of Admission
Admit N %
0 500 38%
1 800 62%
Total 1300 100%
Scores Observation Mean Std. Dev
GRE 1298 151.2 15.2
GPA 1300 3.54 0.392
Gender N %
Male 600 46%
Female 700 54%
1300 100%
Schools N %
LSA 200 15%
Engineering 500 38%
Dental 80 6%
Medical 200 15%
Nursing 220 17%
Law 100 8%
1300 100%
Which statement describes the appropriate modeling for admission and relevant predictors?
Use ANOVA with Admit as a continuous variable
Use GRE score as a ratio in linear regression model
Use department as a nominal response variable
Use logistic regression with Admit as a dependent variable - correct answer Use logistic regression with Admit as a dependent variable
The following table consists of two variables, tier of undergraduate school attended and then the predicted probability of getting into a top-tier graduate school:
Tier Probability
1 .517
2 .352
3 .219
4 .185
Based on this table, which conclusion is correct?
YOUR
ANSWER CORRECT
ANSWER
A student from a second-tier undergraduate school has an 87% chance of getting into a top-tier graduate school.
A student from a first-tier undergraduate school has a 52% change of getting rejected by a top-tier graduate school.
A student from a first-tier undergraduate school has a 52% chance of getting into a top-tier graduate school.
A student from a second-tier undergraduate school has an 87% chance of getting rejected by a top-tier graduate school. - correct answer A student from a first-tier undergraduate school has a 52% chance of getting into a top-tier graduate school.
The following table consists of two variables, the tier of undergraduate school and the predicted probability of getting into a graduate school:
Tier Probability
1 .517
2 .352
3 .219
4 .185
Which conclusion is correct?
YOUR
ANSWER CORRECT
ANSWER
A student from tier 1 undergraduate school has a 52% chance of getting into a graduate school.
A student from tier 2 undergraduate school has a 48% chance of getting into a graduate school.
A student from tier 3 undergraduate school has a 35% chance of getting into a graduate school.
A student from tier 4 undergraduate school has a 22% chance of getting into a graduate school. - correct answer A student from tier 1 undergraduate school has a 52% chance of getting into a graduate school
A student is fitting a simple logistic regression model with a single independent variable. The Wald's test is used to validate the coefficient. The maximum likelihood estimate for it is equal to 7.399, with a standard error of 3.710. The Z-score is 1.96.
Based of this data, what is the 95% confidence interval for the regression parameter in question?
YOUR
ANSWER CORRECT
ANSWER
(0.86, 29.94)
(0.64, 11.25)
(0.21, 7.53)
(0.13, 14.67) - correct answer (0.13, 14.67)
A data analyst is performing logistic regression on elements with high multicollinearity.
What is an assumption of logistic regression that is being violated?
YOUR
ANSWER CORRECT
ANSWER
Normally distributed variables
Variable independence
High R 2
Categorical data type - correct answer Variable independence
What is the correct assumption for the logistic regression model?
YOUR
ANSWER CORRECT
ANSWER
The variance for the error terms must be fixed.
The error terms for the variables are independent.
The G-statistic for the independent variable is 0.
The independent variable must follow a normal distribution. - correct answer The error terms for the variables are independent.
Given the following logistic regression model:
ln( p / (1 - p)) = B 0 + B 1 x 1 + B 2 x 2 + ... Bkxk + E
Which two statements about this logistic regression model are correct?
Choose 2 answers
YOUR
ANSWER CORRECT
ANSWER
B 1 is a regression coefficient.
E is the confidence interval.
x 1 is an explanatory variable.
ln( p / (1 - p)) is the odds of success.
P is a probability value between -1 and 1. - correct answer B 1 is a regression coefficient.
x 1 is an explanatory variable.
Given the following logistic regression function:
ln( πi/(1 - πi)) = β 0 + β 1 xi,
What is π i?
YOUR
ANSWER CORRECT
ANSWER
Maximum likelihood estimation
Probability of success
Logit function
Odds ratio - correct answer Probability of success
Given statistical output in the following table:
Coefficient Std. Error Z Score
(Intercept) -4.13 0.964 -4.28423
Variable 1 0.08 0.026 3.076923
Variable 2 0.185 0.057 3.245614
Variable 3 0.939 0.225 4.173333
Variable 4 0.001 0.004 0.25
Variable 5 0.043 0.01 4.3
Which variable is considered nonsignificant?
YOUR
ANSWER CORRECT
ANSWER
Variable 1
Variable 2
Variable 3
Variable 4
Variable 5 - correct answer Variable 4
Given the following model:
PC i = 1465 a + 784.6 b - 1475 c + 0.49 d
Which component has the largest influence on PC i?
YOUR
ANSWER CORRECT
ANSWER
a
b
c
d - correct answer c
Given the following PCA model:
Principal Components
Variable 1 2 3
Climate -0.029 0.13 0.301
Housing 0.581 0.321 0.011
Health 0.365 0.239 0.025
Education -0.231 0.142 -0.615
Arts 0.680 0.402 0.112
Economy 0.987 0.123 -0.131
Crime 0.520 0.901 0.519
Transportation -0.711 0.034 0.234
Recreation 0.670 0.235 0.516
Which two observations are accurate?
Choose 2 answers
YOUR
ANSWER CORRECT
ANSWER
Principal component 1 has five strongly positive correlated variables.
Principal component 1 has a very strong correlation with Climate.
Principal component 2 has the strongest correlation with Arts.
Principal component 2 has the strongest correlation with Transportation.
Principal component 3 has almost no correlation with Housing and Health. - correct answer Principal component 3 has almost no correlation with Housing and Health.
Given the following dataset:
Comp. 1 Comp. 2 Comp. 3 Comp. 4 Comp. 5
Standard Deviation 1.87 1.339 0.5203 0.3887 0.0878
Proportion of Variance 0.611 0.313 0.0473 0.0264 0.0013
Cumulative Proportion 0.611 0.924 0.9722 0.9986 1
Which two conclusions can be made?
Choose 2 answers
YOUR
ANSWER CORRECT
ANSWER
The third component explains 97% of the behavior in the dataset.
The second components explain 31% of the behavior in the dataset.
The first two components explain 92% of the behavior in the dataset.
The first component explains 70% of the behavior in the dataset. - correct answer The second components explain 31% of the behavior in the dataset.
The first two components explain 92% of the behavior in the dataset.
Given the following results:
Component Eigenvalues Proportion
1 3.29 0.366
2 1.21 0.135
3 1.1 0.123
4 0.9 0.101
5 0.86 0.096
6 0.56 0.063
7 0.48 0.054
8 0.31 0.035
9 0.25 0.027
What conclusion can be made regarding the variances?
YOUR
ANSWER CORRECT
ANSWER
The first four principal components account for more than 80% of the variation.
The cumulative sum of the proportions exceeds 1 since it is not standardized.
The first principal component explains about 37% of the variation in the data.
There is a sharp drop after the second component. - correct answer The first principal component explains about 37% of the variation in the data.
Following the result of a PCA, the five principle components were decided as shown in the following table:
PCA
Variable 1 2 3 4 5
Climate 0.158 0.069 0.801 0.377 0.041
Housing 0.384 0.139 0.08 0.197 -0.581
Health 0.411 -0.372 -0.019 0.113 0.031
Crime 0.259 0.474 0.128 -0.042 0.692
Transportation 0.375 -0.141 -0.141 -0.43 0.191
Education 0.274 -0.452 -0.241 0.457 0.224
Arts 0.474 -0.104 0.011 -0.147 0.012
Recreation 0.353 0.292 0.042 -0.404 -0.306
Economy 0.164 0.541 -0.507 0.476 -0.037
Which conclusion can be made?
YOUR
ANSWER CORRECT
ANSWER
The magnitude of the coefficients depends on the variance.
For the second principle component, Climate is significant.
The negative coefficients indicate that they are not contributing.
The correlation between PC1 and Arts is 0.474. - correct answer The correlation between PC1 and Arts is 0.474. [Show Less]