Answers

Georgia Institute Of TechnologyISYE 64146414_Spring21_Midterm_2_Part_2_Solutions.

ISYE 6414 SP21 Midterm2 Solutions Background For this exam, you will be looking at a data pertaining to mortality rates in countries throughout the... [Show More] world and using this data to predict the Adult Mortality Rate for that country. The data consists of a data frame with 771 observations on the following 9 variables: 1. Status: Developed or Developing Country (categorical) 2. percentage expenditure: Healthcare spending as a % of per capita GDP (continuous) 3. Measles: Reported cases per 1000 pop. (discrete) 4. under-five deaths: Deaths of children under 5 per 1000 pop. (discrete) 5. Polio: % of 1-year old population that has been immunized (discrete) 6. Diphtheria: % of 1-year old population that has been immunized (discrete) 7. GDP: Per capita GDP in USD (continuous) 8. Schooling: Number of years of schooling, avg. for pop. (continuous) 9. Adult Mortality: Total Mortality Rate for all Adults (15-60) per 1000 pop. (discrete) Read the data Read the data and answer the questions below using the supplied R Markdown / Jupyter notebook file. # Load relevant libraries (add here if needed) library(car) ## Loading required package: carData library(aod) library(MASS) # Read the data set mortalityFull = read.csv("Mortality.csv",head=T) row.cnt = nrow(mortalityFull) # Split the data into training and testing sets mortalityTest = mortalityFull[(row.cnt-9):row.cnt,] mortality = mortalityFull[1:(row.cnt-10),] Note: Use mortality as your data set for the following questions unless otherwise stated. Note: Treat all variables as quantitative variables, except for Status. 1 Question 1 - 11 points A) Build a multiple linear regression model named model1 with Adult Mortality as the response variable and all other variables as predicting variables. Include an intercept. Display the summary table of the model. B) Is the overall regression significant at the 0.01 alpha level? Explain. C) Using model1, calculate the Cook’s distance of the points in the dataset and create a plot for the Cook’s Distances. D) Identify the row number of the observation with the highest Cook’s distance. E) Remove this observation from the mortality dataset. Call this new dataset mortality2 and create a new multiple linear regression model, called model2, using same predictors as model1 with Adult Mortality as the response. Display the summary table of this model. Are there any significant differences between the models with and without the outlier? Would you classify this observation as influential? [Show Less]

Preview 2 out of 13 pages