Homework 1 - Solutions
Question 1
Choose the correct statement regarding the sum of residuals calculated using Ordinary Least
Squares (OLS).
A. The
... [Show More] sum of residuals will always be nonzero whatever the form of the linear regression
as long as you are using OLS to estimate the coefficients.
B. The sum of residuals will always be equal to zero if you include intercept term in your
model and you are using OLS to estimate the coefficients
C. The sum of residuals may or may not be zero when using OLS and the R software
makes mathematical adjustments to make it zero.
D. The sum of residuals may or may not be zero when using OLS and the R software
makes no mathematical adjustments because it is not needed.
Sol: B Explanation: The intercept is the catchall term that takes within itself anything that is
not being predicted/accounted for by the independent variables.
Question 2
Choose the correct statement regarding the error terms in the assumption of Ordinary Least
Squares (OLS).
A. The error terms are normally distributed with a constant non-zero
mean and constant Variance
B. The variance of error terms may or may not be constant as long as the
terms are normally distributed with mean equal to zero
C. The error terms follow lognormal distribution with mean equal to zero
and constant variance
D. The error terms are normally distributed with mean equal to zero and a
constant variance
Sol: D Explanation: The OLS model assumes that the error terms (residuals) are normally
distributed with mean equals to zero and constant variance (property of homoscedasticity of
variances)
Questions 3 - 6
The National Traffic Study Institute is conducting a study to find out the relationship between
the speed at which the car is moving and the distance it takes to stop after applying the brakes.
You were hired as a statistician to work on this problem. The data can be accessed as follows:
install.packages(“Ecdat”)
library(Ecdat)
data(cars)
You can easily see that these are the variables present in the dataset and the corresponding units
using help command on R console – speed (in mph) and dist (in ft).
Use this dataset for the following 5 questions.
Question 3
Let’s try to find out if there is a correlation between the distance needed to stop and the speed
at which the car is moving.
What correlation value do you find when doing this in R?
A. 0
B. 0.72
C. 0.81
D. 1
Ans: C Explanation: cor(cars$speed, cars$dist) = 0.806
Question 4
Would you say that distance to stop and speed of the car are?
A. Not correlated
B. Inversely correlated
C. Well correlated
D. Perfectly correlated
Ans: C Explanation: Well-correlated because the value is close to 1 (perfect correlation), but
not exactly.
Question 5
Now, let’s fit a linear model with distance needed to stop as the response and speed as the
predictor. What is the percent variation explained by speed, intercept, and coefficient of speed?
A. 0.65, -17.58 and 3.93
B. 0.65, 17.58 and 3.93
C. 0.65, 8.28 and 0.16
D. 0.89, 0 and 0.31
Ans: A Explanation: Percent variation explained by speed is the R-squared value = 0.65;
intercept of speed (from the regression summary table) = -17.58; coefficient of speed (from the
table again) = 3.93
Question 6
Now suppose we need to change the units of distance needed to stop from feet to meters and
speed from mph to meters per second because we need the results to be standard units. What
would be the results for percent variation explained by speed, intercept, and coefficient of
speed?
A. 0.65, -5.36 and 1.19
B. 0.65, -5.36 and 2.68
C. 0.65, -17.58 and 3.93
D. 0.65, 8.28 and 0.16
Ans: B Explanation: First, change the dataset into proper units. Convert speed from miles per
hour to meters per second (multiply by 0.44704); convert feet into meters again by multiplying
by a conversion factor (0.3048). Then, reuse the same steps as in Qn4 to get regression
summary and look for the same variable outputs.
CODE FOR QN 3 to 6
library(Ecdat)
load(cars)
cor(cars$speed, cars$dist)
lm <- lm(dist ~ speed, data=cars)
summary(lm)
## code to change units
new.dat <- data.frame(speed=7.5)
predict(lm, newdata = new.dat, interval = 'confidence')
3. C Explanation: cor(cars$speed, cars$dist) = 0.806
4. C Explanation: Well correlated because the value is positive and close to 1 but not exactly
1 – so not perfectly correlated but well correlated.
5. A Explanation: percent variation explained by speed is the R-squared value = 0.65; intercept
of speed (from the regression summary table) = -17.58; coefficient of speed (from the table
again) = 3.93
6. B Explanation: First, change the dataset into proper units. Convert speed from miles per
hour to meters per second (multiply by conversion factor); convert feet into meters again by
multiplying by a conversion factor. Then, reuse the same steps as in Qn4 to get regression
summary and look for the same variable outputs.
Question 7
If p-value of a particular parameter in your linear regression model is equal to 1.67e-14,
what does it say about the coefficient of that parameter?
A. The null hypothesis corresponding to this parameter can be rejected and hence
coefficient of the parameter is equal to zero.
B. The nature of the coefficient of the parameter is ambiguous and hence we need to
change the model.
C. The null hypothesis corresponding to this parameter can be rejected and hence
coefficient of the parameter is significant and hence not equal to zero.
D. The null hypothesis corresponding to this parameter can be accepted and hence
coefficient of the parameter is equal to zero.
Answer: C Explanation The null hypothesis of every parameter in the linear regression model
is that the coefficient of the parameter is not different from zero (there is no relationship
between the X variables and the Y variable). The p-value lies between 0 and 1 and if the pvalue is closer to zero, then you can reject the null-hypothesis. The p-value in this question
1.67e-14 which is pretty close to zero. Hence, we can easily reject the null hypothesis which
means the coefficient of that parameter is significantly different than zero which means there
is a relationship between the parameter and the dependent variable. [Show Less]