Answers

WGU Statistics exam with correct answers 2024.

In a standard 52-card deck, where each card is assigned a numeric value, the mean of the 52 cards is 7.0 with a standard deviation of 3.742. To test if a... [Show More] representative sample can be attained, the deck is shuffled and 13 cards are randomly drawn, one at a time. Each card is recorded and then returned randomly to the deck before the next card is drawn. This process of drawing 13 random cards is repeated a total of 20 times, and the mean and standard deviation of each round is computed. Which type of sampling is this an example of? Randomization Multiple comparison Matched pairs Bootstrap - correct answer Bootstrap A study is done to determine if the color of an apple affects the purchase rate by consumers at a grocery store. Purchase counts for three colors of apples are collected. Which technique should be used to determine if there is a difference in purchase rate by color? Rank-based Randomization t-test ANOVA - correct answer ANOVA A hospital network collects the amount of time it takes doctors to see patients. Which two techniques are appropriate to compare the mean times of two subsets within this set of doctors? Choose 2 answers ANOVA t-test Logistic regression Principal component analysis - correct answer ANOVA t-test A data analyst needs to determine whether antioxidants in blackberries can reduce age-related cognitive decline. Twenty-five adults between 65 and 80 are selected to eat blackberries for six months and be retested. The original scores are compared with the second scores to determine whether there was decline. What is the appropriate model to test the hypothesis? z test ANOVA Linear regression Two-sample t-test - correct answer Two-sample t-test Which assumption does a data analyst have to make when performing ANOVA on a dataset? There are just two parameters that are being estimated. The error terms follow a normal probability distribution. The parameters are additive. The explanatory variable is continuous. - correct answer The error terms follow a normal probability distribution. Which two assumptions are made when the linear regression model is used? Choose 2 answers Critical value is based on Fisher's exact test. Error terms are independent and identically distributed. Variances are not equal. Parameters β 0, β 1, and σ 2 are constant. - correct answer Error terms are independent and identically distributed. Parameters β 0, β 1, and σ 2 are constant. Which assumption is required to perform a two-sample t-test? Equal variance Large sample size Small sum of residuals Equal mean - correct answer Equal variance Given the following statistical model: y = 42 - 4.6 b Match each element of the model with its corresponding term. Answer options may be used more than once or not at all. Select your answers from the pull-down list. y Dependent variable Dependent variable 42 Intercept Intercept -4.6 Slope Slope b Independent variable Independent variable - correct answer y Dependent variable Dependent variable 42 Intercept Intercept -4.6 Slope Slope b Independent variable Independent variable Given the following generalized two-sample t-test equation: yi ,j = µi + εi ,j Classify each element in the equation. Answer options may be used more than once or not at all. Select your answers from the pull-down list. - correct answer yi ,j Observed value Observed value µi Mean response Mean response εi ,j Error term Error term Given the following linear model: y i = β 0 + β 1 xi + ε i Classify each element correctly. Answer options may be used more than once or not at all. Select your answer from the pull-down list. - correct answer y i Response variable Response variable β 0 + β 1 xi Mean response Mean response xi Explanatory variable Explanatory variable A study comparing graduating cumulative GPAs from students with two different majors is conducted to study the effect of each major on academic achievement. From the list of all graduating students in both majors, a random sample of 30 students from Major One and 30 students from Major Two are selected. The researcher decides to use an ANOVA test for this study. What will the researcher need to calculate? Sample size and sample mean Group means and standard deviations Population mean and standard deviation Percentage of the population sampled and population mean - correct answer Group means and standard deviations A data analyst is studying the relationship between gender and marriage using the result below: T = test: Paired two sample for means Married Gender Mean 0.48 0.64 Variance 0.26 0.24 Observations 25 25 Pearson correlation 0.220176 Hypothesized mean difference 0 df 24 T stat -1.28103 Confidence level 0.95 P (T<=t) one-tail 0.106213 t critical one-tail 1.710882 P (T<=t) two-tail 0.212425 t critical two-tail 2.063899 Given a confidence level of 95%, what can be concluded? There is no significant difference of mean between these two groups given the t-values. There is a significant difference of mean between these two group as our null hypothesis is µ 1 - µ 2 = 0. There is a significant difference of mean between these two groups as Pearson correlation is positive. There is a significant difference of mean between these two group as our alternative hypothesis is µ 1 > µ 2. - correct answer There is no significant difference of mean between these two groups given the t-values. A junior data analyst intern is using linear regression to predict climate change. Given the following model: GAT ~ 0 + 0.2 OR + 0.1 GOT + 0.4 EAPG + 0.05 (EAPG)(OR) Given the following values: • OR = 10 • GOT = 5 • EAPG = 4 What is the correct result? 0.05 0.61 1.0 6.1 - correct answer 6.1 From hospital records, the following values for the named components are obtained: Treatment Control Average birth weight of infant(g) 3100 2750 SD 420 425 N 75 75 Birth weight is assumed to be normally distributed. The average birth weights in the two groups are tested using an independent two group t-test. H 0 is that there is no difference in the birth weight between the treatment and control groups with α = 0.05. Which formula can be used to solve the problem and show that the birth weight is significantly different? t = =5.07 t = = 1.39 t = = 43.93 t = = 12.04 - correct answer t = =5.07 A data analyst is doing an analysis using t-tests and gets the following output: t -stat: 2.8 p-value: .006 Critical value: 1.2 Which conclusion can be drawn? Accept the null hypothesis. The critical value is too high. Reject the null hypothesis. The two population means are similar. - correct answer Reject the null hypothesis A data analyst is examining the relationship with student drivers between driving skill and texting. A driving circuit is set up and a group of students are each tested twice. The first test is performed while sending and receiving texts and the second test is recorded without texting. After each of the tests, the number of cones hit by student drivers is recorded. What are two appropriate conclusions that should be made? Choose 2 answers Dependent variable is continuous. Independent variable is texting condition. The distribution is skewed. A t-test should be used. - correct answer Independent variable is texting condition. A t-test should be used. A shoe company wants to determine the effect of color and design on price for its new line of shoes. Classify each of this study's variables as independent or dependent. Answer options may be used more than once or not at all. Select your answers from the pull-down list. - correct answer Color Independent variable Independent variable Design Independent variable Independent variable Price Dependent variable Dependent variable An insurance company tries to identify the reason for the maximum claim in a particular segment of a vehicle, and it identifies parameters as given in the following equation: Claims segment ~ α + β 1 tire type + β 2 vehicle size + β 3 make year + β 4 mileage + β 5 vehicle weight + ε In this equation, what is the vehicle size? The response variable denoted by y; will work as constant The confounding variable denoted by y; will determine the claim segment The coefficient value denoted by β; will be nonsignificant An explanatory variable that helps determine claim segment. - correct answer An explanatory variable that helps determine claim segment. A company wants to hire employees who are most likely to be successful at selling the company's products. The data analyst needs to determine if intelligence and friendliness predict sales performance of current employees. The current employees are given tests to measure both factors, and their sales performance data is checked, giving the company three scores for each employee. What is the criterion variable for this study? YOUR ANSWER CORRECT ANSWER Friendliness Intelligence Strategy results Sales performance - correct answer Sales performance A researcher builds a multiple regression model. While testing the model, the researcher tests the model assumptions. Which assumption violates the standard model assumptions? YOUR ANSWER CORRECT ANSWER The variance of the model is constant. The model terms are additive. The error terms are dependent. The model parameters are constant. - correct answer The error terms are dependent. Given the following regressions equation: Height = 96 + 3.4(Age) - 42(Weight) When adding an interaction term between Age and Weight, what will be the outcome? YOUR ANSWER CORRECT ANSWER The intercept value will be dependent on Age and Weight. The unique effect of Age on Height will not be possible to determine. The model will no longer be considered a linear regression equation. The effect of Age on Height will depend on Weight. - correct answer The effect of Age on Height will depend on Weight. Given the following simple linear regression model: y i = β 0 + β 1x 1,i + β 2x 2,i + β 3x 1,I * x 2,i What is the definition of x 1,I * x 2,i? YOUR ANSWER CORRECT ANSWER Response term Interaction term Independent term Dependent term - correct answer Interaction term Given the following equation: R 2 ad j = 1 - ( n - 1 / n - p)(1 - R 2) What does n represent? YOUR ANSWER CORRECT ANSWER Sample size Degrees of freedom Number of coefficients Population size - correct answer Sample size What is the correct formula for AIC? YOUR ANSWER CORRECT ANSWER n[log( σ ̂ 2)] + 2 p n[log( σ ̂ 2)] + p[log( n)] n[log( σ ̂ 2)] + p[log( p)] n[log( σ ̂)] + 2 p - correct answer n[log( σ ̂ 2)] + p[log( n)] A clinical trial designed to evaluate the efficacy of a new drug to increase HDL cholesterol is conducted. One hundred patients enroll in the study and are randomized to receive either a new drug or a placebo. A multiple regression analysis is used to assess effect modification. Let T = Treatment (1 = new drug, 0 = placebo), M = male gender (1 = Yes, 0 = No), and TM = the interaction of treatment and male gender. Given the following results: Independent variable Regression coefficient T p-value Intercept 39.24 65.89 0.0001 T(Treatment) -0.36 -0.43 0.6711 M(Male) -0.18 -0.13 0.8991 TM(Treatment X Male) 6.55 3.37 0.0011 What is the linear regression model? T = 65.89 - 0.13(M) + 3.37(TM) HDL = 39.24 - 0.36(T) - 0.18(M) + 6.55(TM) T = 65.89 - 0.43(T) - 0.13(M) + 3.37(TM) HDL = 65.89 - 0.43(T) - 0.13(M) + 3.37(TM) HDL = -0.36(T) - 0.18(M) - correct answer HDL = 39.24 - 0.36(T) - 0.18(M) + 6.55(TM) Using the following models: Model 1: Score = 2035 - 0.277 (Attendance) + 1299 (Extra Curriculum) Model 2: Score = 7291 - 0.098 (Attendance) + 2325 (Extra Curriculum) Model 3: Score = 3745 - 0.179 (Attendance) + 3200 (Extra Curriculum) + 3122 (Recommendation) Assuming other values are constant in all three models, what will changing one unit in Extra Curriculum result in? YOUR ANSWER CORRECT ANSWER Changes in score by 3200 in Model 2 Changes in score by 3200 in Model 3 Changes in score by 3200 and 1299 in Models 1 and 3, respectively Changes in score by 3745 in Model 3 - correct answer Changes in score by 3200 in Model 3 iven the following model, which is being used to predict the number of calories a cyclist will burn during a workout: Calories = 188 + 4(Distance) - .05(Age) + .4(Age * Distance) How many calories is a 30-year-old cyclist predicted to burn after 10 miles? YOUR ANSWER CORRECT ANSWER 344.5 346.5 364.5 366.5 - correct answer 346.5 A scientist is studying which factors will affect tree height. Data from 100 trees is collected. The independent variables are the trees' location, ambient temperature, and soil type. Given the following multiple linear regression model, which is being used to analyze the results: Tree height ~ 3.52 tree location + 5.23 ambient temperature + 2.15 soil type The scientist tests the model's validity using the F-test and calculates the R 2 value. The R 2 value for this model is 0.93, F 2, 97 is 8.02, and F 97, 2 is 99.3. Which conclusion can be made? YOUR ANSWER CORRECT ANSWER The p-value is equal to 0.07, indicating a bad fit for the model. F 2, 97 is less than F 97, 2, indicating a good fit for the model. R 2 is less than 1, indicating a bad fit for the model. R 2 is approaching 1, indicating a good fit for the model. - correct answer R 2 is approaching 1, indicating a good fit for the model. How is the Residual Sum of Squares (SSE) defined? - correct answer Sum of squares [n, t=1] (yi-y^)2 A researcher is studying the effect of two medicines that treat insomnia. The medicine is tested on 40 people with insomnia. Given the results in the following table: Medication Cured Not Cured Medication 1 18 7 Medication 2 11 4 The researcher wants to compare the effect of two medicines and get a comparably accurate estimation for a p-value for this test. Which method should the researcher use? YOUR ANSWER CORRECT ANSWER Two-sided hypothesis test Fisher's exact test ANOVA Odds ratio - correct answer Fisher's exact test A data analyst working on a remote island has no available computing technology, but needs to compute a p-value for three categories of data about local flora and fauna. Which technique is appropriate for this scenario? YOUR ANSWER CORRECT ANSWER Chi-square test Student's t-test ANOVA Fisher's exact test - correct answer Chi-square test A researcher is studying whether presence or absence of a particular gene in mice is tied to fur color (white or brown). A total of 1000 mice were included in this study. There is presently no scientific evidence known to tie the presence or absence of this gene to either white or brown fur in mice. Which technique should be used to evaluate the results of this study? YOUR ANSWER CORRECT ANSWER Simulation study Fisher's exact test Two-sided hypothesis test Chi-square test - correct answer Two-sided hypothesis test Given the following dataset: Did Not Heal Did Heal Total Bandage N % N % N % Elastic 23 82% 2 13% 25 58% Inelastic5 18% 13 87% 18 42% Total 28 65% 15 35% 43 100% Why is the Fisher's exact test more appropriate to use than the chi-square test? YOUR ANSWER CORRECT ANSWER One of the cell values misrepresents the population. One of the cell values represents a different distribution than the others. One of the cell values overestimates the sample. One of the cell values is too small. - correct answer One of the cell values is too small. What is the major assumption needed to perform Fisher's exact test? YOUR ANSWER CORRECT ANSWER The observations follow a hypergeometric distribution. The proportion between the two groups should be larger than 5%. The number for each cell should be smaller than 5. The row and column counts must be known. - correct answer The row and column counts must be known An analyst is tasked with testing a case with a proportion of success that is close to 0%. A chi-square test is used. What is the assumption for this method? YOUR ANSWER CORRECT ANSWER Contingency tables are limited to 2×2 sizing Expected count is calculated based on the distribution following the alternative hypothesis Separate samples for each proportion should be assumed Large sample to allow for expected values greater than 5 - correct answer Large sample to allow for expected values greater than 5 Given the following dataset: Red Green Male 4 8 12 Female 9 6 15 13 14 27 What is the p-value for this data using Fisher's exact test? YOUR ANSWER CORRECT ANSWER 0.05 0.25 0.75 1.0 - correct answer 0.25 Given the following study results: Red Green Male 4 8 12 Female 9 6 15 13 14 27 What is the chi-square statistic for the 95% confidence interval? YOUR ANSWER CORRECT ANSWER 0.17 0.58 1.90 3.90 - correct answer 1.90 A data analyst is testing the effects of two types of medication. Given the following results: Medication 1 Medication 2 Cured 58 39 Uncured 22 57 The analyst performs the chi-square test. What is the expected count, rounded to the nearest whole number, of uncured that were given medication 1? YOUR ANSWER CORRECT ANSWER 26 36 44 53 - correct answer 36 Given the following chart: H0 : µ= µ0 OR H0 : µ= µ0 OR H0 : µ= µ0 Ha : µ≠ µ0 OR Ha : µ> µ0 OR Ha : µ< µ0S What is the researcher using this chart for? YOUR ANSWER CORRECT ANSWER Two-sided t-test Fischer's exact test Wald's confidence interval Chi-square test - correct answer Two-sided t-test A researcher uses a chi-square test to find χ 2 = 6.78. Which conclusion can be drawn from this? YOUR ANSWER CORRECT ANSWER Due to the small value of χ 2, observed and expected counts are the same. The χ 2 value does not allow conclusions, and a p-value must be calculated. The χ 2 value is greater than 3.84, thus the null hypothesis should be rejected. Due to the small value of χ 2, observed and expected counts are different. - correct answer The χ 2 value does not allow conclusions, and a p-value must be calculated. A data analyst studying dieting habits selects 24 athletes and divides them into groups by age (over 35, and 35 or under) and by whether they are dieting (dieting, or not dieting). Ten of the athletes are dieters, and 12 of the athletes are over 35. The hypothesis is: Athletes in both age groups are equally likely to diet. Given the following tables: Data Table 35 and Under Over 35 Totals Dieting 1 9 10 Not Dieting 11 3 14 Column Totals 12 12 24 Expected 35andUnder Over35 Totals Dieting 0 10 10 Not Dieting 12 2 14 Column Totals 12 12 24 What can be concluded from the study? YOUR ANSWER CORRECT ANSWER Reject the null hypothesis Variables are not fixed Distribution is normal p-values are approximate - correct answer Reject the null hypothesis An analyst was asked to analyze the test case of effectiveness of reducing a side effect, comparing the treatment versus placebo group on the result of having a side effect or not. What is the proper method to use in this analysis? YOUR ANSWER CORRECT ANSWER Polynomial regression Logistic regression PCA Linear regression - correct answer Logistic regression During the interaction assessment stage using the following dataset: Variable Abbreviation Coding Type of Hospital HT 1 = public, 0 = private Size of Hospital HS 1 = large, 0 = small Degree of contamination CT 1 = contaminated, 0 = clean Income INC Continuous Gender GN 1 = female, 0 = male Drinker DRK 1 = Yes, 0 = No Factors p-value HT x CT 0.011 HT x GN 0.025 HT x INC 0.057 HT x HS 0.124 HT x DRK 0.092 CT 0.097 Using the hierarchy principle, which two variables should be retained for all further models considered? Choose 2 answers GN INC HS CT DRK - correct answer GN CT Given the following table that shows the summary of admission data: Summary of Admission Admit N % 0 500 38% 1 800 62% Total 1300 100% Scores Observation Mean Std. Dev GRE 1298 151.2 15.2 GPA 1300 3.54 0.392 Gender N % Male 600 46% Female 700 54% 1300 100% Schools N % LSA 200 15% Engineering 500 38% Dental 80 6% Medical 200 15% Nursing 220 17% Law 100 8% 1300 100% Which statement describes the appropriate modeling for admission and relevant predictors? Use ANOVA with Admit as a continuous variable Use GRE score as a ratio in linear regression model Use department as a nominal response variable Use logistic regression with Admit as a dependent variable - correct answer Use logistic regression with Admit as a dependent variable The following table consists of two variables, tier of undergraduate school attended and then the predicted probability of getting into a top-tier graduate school: Tier Probability 1 .517 2 .352 3 .219 4 .185 Based on this table, which conclusion is correct? YOUR ANSWER CORRECT ANSWER A student from a second-tier undergraduate school has an 87% chance of getting into a top-tier graduate school. A student from a first-tier undergraduate school has a 52% change of getting rejected by a top-tier graduate school. A student from a first-tier undergraduate school has a 52% chance of getting into a top-tier graduate school. A student from a second-tier undergraduate school has an 87% chance of getting rejected by a top-tier graduate school. - correct answer A student from a first-tier undergraduate school has a 52% chance of getting into a top-tier graduate school. The following table consists of two variables, the tier of undergraduate school and the predicted probability of getting into a graduate school: Tier Probability 1 .517 2 .352 3 .219 4 .185 Which conclusion is correct? YOUR ANSWER CORRECT ANSWER A student from tier 1 undergraduate school has a 52% chance of getting into a graduate school. A student from tier 2 undergraduate school has a 48% chance of getting into a graduate school. A student from tier 3 undergraduate school has a 35% chance of getting into a graduate school. A student from tier 4 undergraduate school has a 22% chance of getting into a graduate school. - correct answer A student from tier 1 undergraduate school has a 52% chance of getting into a graduate school A student is fitting a simple logistic regression model with a single independent variable. The Wald's test is used to validate the coefficient. The maximum likelihood estimate for it is equal to 7.399, with a standard error of 3.710. The Z-score is 1.96. Based of this data, what is the 95% confidence interval for the regression parameter in question? YOUR ANSWER CORRECT ANSWER (0.86, 29.94) (0.64, 11.25) (0.21, 7.53) (0.13, 14.67) - correct answer (0.13, 14.67) A data analyst is performing logistic regression on elements with high multicollinearity. What is an assumption of logistic regression that is being violated? YOUR ANSWER CORRECT ANSWER Normally distributed variables Variable independence High R 2 Categorical data type - correct answer Variable independence What is the correct assumption for the logistic regression model? YOUR ANSWER CORRECT ANSWER The variance for the error terms must be fixed. The error terms for the variables are independent. The G-statistic for the independent variable is 0. The independent variable must follow a normal distribution. - correct answer The error terms for the variables are independent. Given the following logistic regression model: ln( p / (1 - p)) = B 0 + B 1 x 1 + B 2 x 2 + ... Bkxk + E Which two statements about this logistic regression model are correct? Choose 2 answers YOUR ANSWER CORRECT ANSWER B 1 is a regression coefficient. E is the confidence interval. x 1 is an explanatory variable. ln( p / (1 - p)) is the odds of success. P is a probability value between -1 and 1. - correct answer B 1 is a regression coefficient. x 1 is an explanatory variable. Given the following logistic regression function: ln( πi/(1 - πi)) = β 0 + β 1 xi, What is π i? YOUR ANSWER CORRECT ANSWER Maximum likelihood estimation Probability of success Logit function Odds ratio - correct answer Probability of success Given statistical output in the following table: Coefficient Std. Error Z Score (Intercept) -4.13 0.964 -4.28423 Variable 1 0.08 0.026 3.076923 Variable 2 0.185 0.057 3.245614 Variable 3 0.939 0.225 4.173333 Variable 4 0.001 0.004 0.25 Variable 5 0.043 0.01 4.3 Which variable is considered nonsignificant? YOUR ANSWER CORRECT ANSWER Variable 1 Variable 2 Variable 3 Variable 4 Variable 5 - correct answer Variable 4 Given the following model: PC i = 1465 a + 784.6 b - 1475 c + 0.49 d Which component has the largest influence on PC i? YOUR ANSWER CORRECT ANSWER a b c d - correct answer c Given the following PCA model: Principal Components Variable 1 2 3 Climate -0.029 0.13 0.301 Housing 0.581 0.321 0.011 Health 0.365 0.239 0.025 Education -0.231 0.142 -0.615 Arts 0.680 0.402 0.112 Economy 0.987 0.123 -0.131 Crime 0.520 0.901 0.519 Transportation -0.711 0.034 0.234 Recreation 0.670 0.235 0.516 Which two observations are accurate? Choose 2 answers YOUR ANSWER CORRECT ANSWER Principal component 1 has five strongly positive correlated variables. Principal component 1 has a very strong correlation with Climate. Principal component 2 has the strongest correlation with Arts. Principal component 2 has the strongest correlation with Transportation. Principal component 3 has almost no correlation with Housing and Health. - correct answer Principal component 3 has almost no correlation with Housing and Health. Given the following dataset: Comp. 1 Comp. 2 Comp. 3 Comp. 4 Comp. 5 Standard Deviation 1.87 1.339 0.5203 0.3887 0.0878 Proportion of Variance 0.611 0.313 0.0473 0.0264 0.0013 Cumulative Proportion 0.611 0.924 0.9722 0.9986 1 Which two conclusions can be made? Choose 2 answers YOUR ANSWER CORRECT ANSWER The third component explains 97% of the behavior in the dataset. The second components explain 31% of the behavior in the dataset. The first two components explain 92% of the behavior in the dataset. The first component explains 70% of the behavior in the dataset. - correct answer The second components explain 31% of the behavior in the dataset. The first two components explain 92% of the behavior in the dataset. Given the following results: Component Eigenvalues Proportion 1 3.29 0.366 2 1.21 0.135 3 1.1 0.123 4 0.9 0.101 5 0.86 0.096 6 0.56 0.063 7 0.48 0.054 8 0.31 0.035 9 0.25 0.027 What conclusion can be made regarding the variances? YOUR ANSWER CORRECT ANSWER The first four principal components account for more than 80% of the variation. The cumulative sum of the proportions exceeds 1 since it is not standardized. The first principal component explains about 37% of the variation in the data. There is a sharp drop after the second component. - correct answer The first principal component explains about 37% of the variation in the data. Following the result of a PCA, the five principle components were decided as shown in the following table: PCA Variable 1 2 3 4 5 Climate 0.158 0.069 0.801 0.377 0.041 Housing 0.384 0.139 0.08 0.197 -0.581 Health 0.411 -0.372 -0.019 0.113 0.031 Crime 0.259 0.474 0.128 -0.042 0.692 Transportation 0.375 -0.141 -0.141 -0.43 0.191 Education 0.274 -0.452 -0.241 0.457 0.224 Arts 0.474 -0.104 0.011 -0.147 0.012 Recreation 0.353 0.292 0.042 -0.404 -0.306 Economy 0.164 0.541 -0.507 0.476 -0.037 Which conclusion can be made? YOUR ANSWER CORRECT ANSWER The magnitude of the coefficients depends on the variance. For the second principle component, Climate is significant. The negative coefficients indicate that they are not contributing. The correlation between PC1 and Arts is 0.474. - correct answer The correlation between PC1 and Arts is 0.474. [Show Less]

Preview 4 out of 38 pages