ISYE 6414 SP21 Midterm2 Solutions
Background
For this exam, you will be looking at a data pertaining to mortality rates in countries throughout the
... [Show More] world
and using this data to predict the Adult Mortality Rate for that country.
The data consists of a data frame with 771 observations on the following 9 variables:
1. Status: Developed or Developing Country (categorical)
2. percentage expenditure: Healthcare spending as a % of per capita GDP (continuous)
3. Measles: Reported cases per 1000 pop. (discrete)
4. under-five deaths: Deaths of children under 5 per 1000 pop. (discrete)
5. Polio: % of 1-year old population that has been immunized (discrete)
6. Diphtheria: % of 1-year old population that has been immunized (discrete)
7. GDP: Per capita GDP in USD (continuous)
8. Schooling: Number of years of schooling, avg. for pop. (continuous)
9. Adult Mortality: Total Mortality Rate for all Adults (15-60) per 1000 pop. (discrete)
Read the data
Read the data and answer the questions below using the supplied R Markdown / Jupyter notebook file.
# Load relevant libraries (add here if needed)
library(car)
## Loading required package: carData
library(aod)
library(MASS)
# Read the data set
mortalityFull = read.csv("Mortality.csv",head=T)
row.cnt = nrow(mortalityFull)
# Split the data into training and testing sets
mortalityTest = mortalityFull[(row.cnt-9):row.cnt,]
mortality = mortalityFull[1:(row.cnt-10),]
Note: Use mortality as your data set for the following questions unless otherwise stated.
Note: Treat all variables as quantitative variables, except for Status.
1
Question 1 - 11 points
A) Build a multiple linear regression model named model1 with Adult Mortality as the response variable
and all other variables as predicting variables. Include an intercept. Display the summary table of the
model.
B) Is the overall regression significant at the 0.01 alpha level? Explain.
C) Using model1, calculate the Cook’s distance of the points in the dataset and create a plot for the
Cook’s Distances.
D) Identify the row number of the observation with the highest Cook’s distance.
E) Remove this observation from the mortality dataset. Call this new dataset mortality2 and create a new
multiple linear regression model, called model2, using same predictors as model1 with Adult Mortality
as the response. Display the summary table of this model. Are there any significant differences between
the models with and without the outlier? Would you classify this observation as influential? [Show Less]