In 2009, a large Midwestern University wanted to give a report to the state’s board of regents to justify the continued expenditures for their study abro
... [Show More] ad programs. In particular, they wanted to show that students who studied abroad had better language proficiency than their peers who did not study abroad. Every entering student is required to take a language proficiency/placement exam at the beginning of their first year.
Four years later, the study abroad office took a random sample of 1000 students who had completed beyond the 3 semesters minimum required for their general education credits and administered an additional proficiency exam to those students. Of the 1000 students, 221 students had completed one or more semesters abroad, whereas 779 had not done no study abroad.
Of the 221 who studied abroad, 193 were found to be proficient or above in their language skills, whereas 534 of the students who had not studied abroad demonstrated proficient or above language skills.
a) (3 points) Can this study be considered an experiment? Why or why not?
This study is not an experiment. Because an experiment has random assignment and a researcher manipulates the Independent Variable, allows researcher to infer causation. There is not manipulation in this study. This would be an observation.
b) (8 points)Let represent the proportion of the proficient students who studied abroad. Let represent the proportion of proficient students who did not study abroad. Find the values for each proportion, then list the additional informationnecessary to construct a 90 percent confidence interval for - , including conditionsand the formula. Then compute the confidence interval.
193/221=0.8733
534/779=0.6855
Standard error = .02788
ME = .04589
L limit = 0.14195
U Limit = 0.23367
Z score = 5.5314
c) (3 points) Using the confidence interval you reported in part (a), can you reject the null hypothesis that the proportion of students who are language proficient with study abroad is the same as the proportion of students who are language proficient without study abroad? Explain.
The p value is <.0001 so with this information we can reject the null hypothesis. The standard error are not within the upper and lower limits we conclude that there is significant evidence that there is a difference between the study abroad and those who do not.
d) (2 points) Give at least one good reason why it would have been better if the number of students in the study who had studied abroad would have been equal to the number of students who had not studied abroad.
It would have been better for the two groups to be equal because having two different sizes causes more variance. It also can cause it to be skewed and sometimes violate normality.
Question 2(24points)
A research laboratory is developing new compounds to provide relief from a specific allergy. In an experiment with 60 volunteer subjects who suffered from the allergy, the amounts of two active ingredients (ingredient A and ingredient B) were varied. Two levels (0.2g and 0.4g) were used for ingredient A and three levels (1g, 2g and 3g) were used for ingredient B. Consequently, six different compounds were made from the combinations of levels for these two ingredients. The 60 subjects were randomly assigned to the six compounds so that 10 subjects were assigned to each of the six compounds. Each subject took a pill containing the assigned compound and the number of hours of relief from allergy symptoms was recorded for each subject.
a) (3 points) Identify the experimental units (the who), the treatments, and the response that was measured on each subject.
Individuals with suffered from the allergy. Treatment were 6 levels of compounds containing different dosages of ingredient A and ingredient B. The response number of hours of relief from allergy symptoms.
b) (2 points) Randomization was used in this experiment. What is the reason (or reasons) for randomly assigning subjects to treatments?
The reason for randomly assigning subjects to treatments is so it eliminates the possibility for any biases.
c) (6 points) Considering the experimental design, list at least three scientific questions which the national chain would want to answer from the data generated by this study. For each question, write an appropriate null hypothesis and an appropriate alternative hypothesis.
Does higher dosage of A and B give more relief than lower dosages?
Does Ingredient B offer more relief than ingredient A?
Does Ingredient A offer more relief than ingredient B?
d) (8 points) Outline an analysis of variance that would be useful for testing some or all of the hypotheses you presented in part (c). You should have one line in your ANOVA table for each source of variation and you should report the value for the corresponding degrees of freedom. Using the outline of the ANOVA table, describe how you would test some or all of the hypotheses you presented in part (c). (Note that you do not have any data, so you cannot compute numerical values for sums of squares, mean squares, or test statistics.)
Source of variation
SS
Df
MS
F
P
Ingriedient A
1
Ingredient B
2
Ingredient A x B
5
Within
54
Total
62
e) (3 points) Although blocking was not done in this experiment, describe the potential benefit of using blocking in an experiment. Identify a potential blocking factor for this experiment, explain why it would be a good blocking factor, and describe how the experiment could be performed as a randomized block design.
By blocking you can isolate the variability attributive to the difference between blocks, so you can see the difference caused by the treatment more clearly. IT also helps to avoid nuisance factors. A potential blocking factor one could use is blocking all the levels with ingredient A at 0.2g, and block all the levels with ingredient A at 0.4g. by doing this one can see if there is a significant difference between the two different levels of ingredient A.
f) (2 points)Describe what would need to be done to make this a double blind experiment.
To make this a double blind experiment the person receiving the treatment wouldn’t know what level of treatment they are receiving and the person administering the treatment wouldn’t know what treatment the participant is receiving either.
Question 3:(12points)
A random poll of voting age citizens in Montana was conducted to gauge the current partisan make up of the state to prepare for the upcoming primary election. Participants were asked to identify their gender and party affiliation.
Gender
Democrat
Independent
Republican
Men
36
24
45
Women
48
16
33
Do these data suggest that there are significant differences in the distribution of partisan affiliations for men and women? Perform an appropriate test to address this question. Your response should include:
i) A precise statement of the null hypothesis and the alternative hypothesis.
ii) Checks of the conditions for inference.
iii)
Randomization is met. The participants are from a random poll of voting age citizens. Normality is met because it has a large sample, n=202. 10 % is met its less than 10% of the entire population.
iv) The formula and value of your test statistic, relevant degrees of freedom, and a p-value.
v) A clear statement of your conclusion in the context of this study.
Question 4:(48 points)
As part of a study of student performance at a large university, data were collected on a random sample of freshman computer science majors. Of particular interest was the cumulative grade point average (GPA) at the end of each student’s first three semesters at the university. Other information recorded on each student at the time the student enrolled at the university includes average high school grades in mathematics (HSM), average high school grades in science (HSS), and average high school grades in English and communication courses (HSE). Researchers at the university were interested in predicting the GPA’s for computer science majors at the end of first three semesters of enrollment from the information on high school grades. In this data set, high school grades were coded on a scale from 1 to 10, with 10 corresponding to an A, 9 to a A-, 8 to a B+, etc. At this university, GPA’s are recorded on a scale from 0 to 6, with 6 corresponding to a straight A performance. Results for 224 computer science majors were included in this study; there were 145 men and 79 women.
a) The researchers wanted to know if there was a significant difference between the average GPA’s at the end of three semesters of study for men and women computer science majors. They created the following box plot and compiled the following summary statistics.
GPA Summary Statistics
Gender
Number
Mean
Standard Deviation
95% Confidence Intervals
Lower Upper
Men
145
4.6077
0.8068
4.5225
4.8489
Women
79
4.6857
0.7288
4.4753
4.7402
i) (4 points) What information is provided by the side-by-side box plots? Do conditions for inference appear to be satisfied?
The box plots show us that the center of distribution is about the same, the median is about the same, variation is about the same and the IQR are about the same. It shows us that could be a little skewed to the left being its towards the top of the whisker. So it shows us we have normal distribution. And our normality is met.
ii) (2 points) State the null hypothesis and the alternative hypothesis that the researchers should use to answer their question.
iii) (2 points) Explain why a two sample t-test should be used in this situation instead of a paired t-test.
A two sample t test should be used in this instance because of several factors that could be part of this study. Some examples would be some students entering into college not taking computer classes before or all of them taking different classes from different schools. Another factor to take into consideration is students taking different classes each semester so one is not comparing the same class load with each other. Another reason is because of different levels of education when entering into the school.
iv) (6 points) Perform the t-test. Report the value the test statistic, its degrees of freedom, and report a p-value. State your conclusion in the context of this study.
T-stat = -0.7148512
Df = 222
p-Value = 0.4755
Stndard error = 0.10911361
v) (4 points) Explain why checking if the 95% confidence interval for the mean GPA for men overlaps with the 95% confidence interval for the mean GPA for women is not an appropriate method for testing the null hypothesis you stated in part (ii).
b) (4 points) As a first step in examining the relationship between GPA after the first three semesters at the university and average high school grades in mathematics (HSM), science (HSS) and English (HSE), the researchers computed the following correlation results and the following scatterplot matrix.
Correlations
GPA
HSM
HSS
HSE
GPA
1.0000
0.4365
0.3294
0.2890
HSM
0.4365
1.0000
0.5757
0.4469
HSS
0.3294
0.5757
1.0000
0.5794
HSE
0.2890
0.4469
0.5794
1.0000
Pairwise Correlations
Variable
by Variable
Correlation
Count
Lower 95%
Upper 95%
Signif Prob
HSM
GPA
0.4365
224
0.3240
0.5369
<.0001*
HSS
GPA
0.3294
224
0.2073
0.4414
<.0001*
HSS
HSM
0.5757
224
0.4809
0.6572
<.0001*
HSE
GPA
0.2890
224
0.1641
0.4048
<.0001*
HSE
HSM
0.4469
224
0.3355
0.5460
<.0001*
HSE
HSS
0.5794
224
0.4851
0.6603
<.0001*
Summarize what these results tell you about relationships between the four variables, GPA, HSM, HSS, and HSE.
c) The following results (produced by JMP) are from the regression of GPA on HSM, HSS, and HSE.
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square
F Ratio
Prob > F
R-Square
Model
3
27.71233
9.23744
18.8606
<.0001
0.2046
Error
220
107.75046
0.48977
C. Total
223
135.46279
Parameter Estimates
Term
Estimate
Std Error
t Ratio
Prob>|t|
Intercept
2.5898766
0.294243
8.80
<.0001
HSM
0.1685666
0.035492
4.75
<.0001
HSS
0.0343156
0.037559
0.91
0.3619
HSE
0.0451018
0.038696
1.17
0.2451
i) (2 points) Interpret the value.
ii) (6 points) Write out the formula for the equation for predicting GPA’s for computer science students at the end of the first three semesters of enrollment from high school academic performance as summarized by the HSM, HSS, and HSE variables. Interpret the estimates of the regression coefficients for this model.
iii) (4 points) The coefficients for HSS and HSE are not statistically significant for this model. Does this imply that neither HHS nor HSE provide any information for predicting GPA? Should both HSS and HSE be deleted from the model? Explain.
d) (4 points) Two of the partial residual (leverage) plots produced by JMP are shown below. Summarize the information in these plots.
e) (4 points) A plot of the residuals versus the predicted values is shown below. Summarize the information this plot provides about how well the model describes the data and if conditions for inference are well satisfied.
f) (6 points) A new student, Jane, will enter the university as a computer science major in Fall 2017. Jane has average scores of 7, 7, and 9 in her high school classes on mathematics, science, and English, respectively. The model predicts that her GPA at the end of three semesters at the university will be 4.42. Show how this prediction is obtained by inserting appropriate values into the prediction equation you reported in part (c) of this problem. The standard error of this GPA estimate (mean estimate) is 0.085. Show how to construct an interval such that you would have 95 percent confidence that the interval will contain Jane’s GPA at the end of her first three semesters at the university.
Question 5:(40 points)
For the study described in problem 4, the researchers also collected data on each student’s score on the quantitative part of the SAT exam (SATM) and the verbal part of the SAT exam (SATV).
These two SAT scores are included with the HSM, HSS, and HSE scores and the GPA at the end of three semesters in the data file posted as male_gpa.csv, for the 145 male computer science majors in the study.
Use these data to build a good prediction equation for GPA at the end of the three semesters for male computer science majors. Your report should include the following parts.
(a) (12 points) Briefly report on each of the steps you took to develop a good prediction model. There is no need to include all of the details for each step of your investigation, but you should write one or two sentences describing what you learned from each step of your investigation. If you think it is important to include a graphical display or table to make your point please include the graph or table in your report.
(b) (12 points) Report the value, the ANOVA table, and a table of parameters estimates for the prediction model you think is best. Interpret the parameters in your model. You may have found more than one good model in part (a), and you can comment on those models, but you only to need to talk about the resul [Show Less]