Exam

ISYE 6414 Final Exam with Complete Solutions

Logistic Regression - ANSWER-Commonly used for modeling binary response data. The response variable is a binary variable, and thus, not normally... [Show More] distributed. In logistic regression, we model the probability of a success, not the response variable. In this model, we do not have an error term g-function - ANSWER-We link the probability of success to the predicting variables using the g link function. The g function is the s-shape function that models the probability of success with respect to the predicting variables The link function g is the log of the ratio of p over one minus p, where p again is the probability of success Logit function (log odds function) of the probability of success is a linear model in the predicting variables The probability of success is equal to the ratio between the exponential of the linear combination of the predicting variables over 1 plus this same exponential Odds of a success - ANSWER-This is the exponential of the Logit function Logistic Regression Assumptions - ANSWER-Linearity: The relationship between the g of the probability of success and the predicted variable, is a linear function. Independence: The response binary variables are independently observed Logit: The logistic regression model assumes that the link function g is a logit function Linearity Assumption - ANSWER-The Logit transformation of the probability of success is a linear combination of the predicting variables. The relationship may not be linear, however, and transformation may improve the fit The linearity assumption can be evaluated by plotting the logit of the success rate versus the predicting variables. If there's a curvature or some non-linear pattern, it may be an indication that the lack of fit may be due to the non-linearity with respect to some of the predicting variables Logistic Regression Coefficient - ANSWER-We interpret the regression coefficient beta as the log of the odds ratio for an increase of one unit in the predicting variable We do not interpret beta with respect to the response variable but with respect to the odds of success The estimators for the regression coefficients in logistic regression are unbiased and thus the mean of the approximate normal distribution is beta. The variance of the estimator does not have a closed form expression Model parameters - ANSWER-The model parameters are the regression coefficients. There is no additional parameter to model the variance since there's no error term. For P predictors, we have P + 1 regression coefficients for a model with intercept (beta 0). We estimate the model parameters using the maximum likelihood estimation approach Response variable - ANSWER-The response data are Bernoulli or binomial with one trial with probability of success MLE - ANSWER-The resulting log-likelihood function to be maximized, is very complicated and it is non-linear in the regression coefficients beta 0, beta 1, and beta p MLE has good statistical properties under the assumption of a large sample size i.e. large N For large N, the sampling distribution of MLEs can be approximated by a normal distribution The least square estimation for the standard regression model is equivalent with MLE, under the assumption of normality. MLE is the most applied estimation approach Parameter estimation - ANSWER-Maximizing the log likelihood function with respect to beta0, beta1 etc in closed (exact) form expression is not possible because the log likelihood function is a non-linear function in the model parameters i.e. we cannot derive the estimated regression coefficients in an exact form Use numerical algorithm to estimate betas (maximize the log likelihood function). The estimated parameters and their standard errors are approximate estimates Binomial Data - ANSWER-This is binary data with repititions Marginal Relationship - ANSWER-Capturing the association of a predicting variable to the response variable without consideration of other factors Conditional Relationship - ANSWER-Capturing the association oof a predicting variable to the response variable conditional of other predicting variables in the model Simpson's paradox - ANSWER-This is when the addition of a predictive variable reverses the sign on the coefficients of an existing parameter It refers to reversal of an association when looking at a marginal relationship versus a partial or conditional one. This is a situation where the marginal relationship adds a wrong sign This happens when the 2 variables are correlated Normal Distribution - ANSWER-Normal distribution relies on a large sample of data. Using this approximate normal distribution we can further derive confidence intervals. Since the distribution is normal, the confidence interval is the z-interval **Applies for Logistic & Poisson Regression Hypothesis Testing (coefficient == 0) - ANSWER-To perform hypothesis testing, we can use the approximate normal sampling distribution. The resulting hypothesis test is also called the Wald test since it relies on the large sample normal approximation of MLEs To test whether the coefficient betaj = 0 or not, we can use the z- value **Applies for Logistic & Poisson Regression Wald Test (Z-test) - ANSWER-The z-test value is the ratio between the estimated coefficient minus 0, (which is the null value) divided by the standard error We reject the null hypothesis that the regression coefficient is 0 if the z value (gets too large) is larger in absolute value than the z critical point, (or the 1- alpha over 2 of the normal quantile). We interpret that the coefficient is statistically significant **Applies for Logistic & Poisson Regression Hypothesis Testing (coefficient == constant) - ANSWER-To test if the regression coefficient is equal to this constant b, then the z-value changes. We subtract b from the estimated coefficients of the numerator We decide to reject/accept using the P-value The P-value is 2 times the left tail of the standard normal of the quantile provided by the absolute value of the z-value P-value = 2P(Z > |z-value|) **Applies for Logistic & Poisson Regression Hypothesis testing (statistical significance: +/-) - ANSWER-Here, the z-value is the same but the P-value will change Positive: P-value = P(Z > z-value) Negative: P-value = P(Z < z-value) **Applies for Logistic & Poisson Regression Statistical Inference - ANSWER-Logistic Regression: Normal Distribution. The statistical inference based on the normal distribution applies only under large sample data. If the sample size, or n, is small? Then the statistical inference is not reliable i.e. warn on the lack of the reliability of the results Standard Regression: T-Distribution. The statistical inference relies on the distribution that applies under both small and large samples **Applies for Logistic & Poisson Regression Type I Error - ANSWER-This happens if the sample size, or n is small. The hypothesis testing procedure will have a probability of Type I error larger than the significance level (i.e. more Type I errors than expected) **Applies for Logistic & Poisson Regression Deviance - ANSWER-This is the difference between the log likelihood from a reduced model and the log likelihood from a full model For large sample size data, the distribution (assuming the null hypothesis is true), is a chi square distribution with Q degrees of freedom Q = Number of Z predicting variables (controlling variables for bias selection) i.e. the number of regression coefficients discarded from the full model to get the reduced model The P-value of the test is computed as the right tail of the chi-square distribution with Q degrees of freedom of the test value (Deviance) **This test is NOT a goodness of fit test. It simply compares two models and decides whether the larger model is statistically significantly better than the reduced model. Coefficient Test (Deviance) - ANSWER-The hypothesis testing procedure is testing the null hypothesis that all alpha coefficients are zero, versus the alternative that at least one alpha coefficient is not zero For the testing procedure for subsets of coefficients, we compare the likelihood of a reduced model versus a full model. This test provides inferences on the predictive power of the model. Predictive power means that the predicting variables predict the data even if one or more of the assumptions do not hold Overall Regression - ANSWER-Standard Regression: We use the F test to test for the overall regression Logistic Regression: We use the difference between the log likelihood function of the model under the null hypothesis (also called the null-deviance), and the log likelihood of the full model (residual deviance) i.e. the difference between the null deviance and the residual deviance Overall Regression (Logistic) - ANSWER-The test statistic is a chi-squared distribution with p degrees of freedom where p is the number of predicting variables. We reject the null hypothesis when the P-value is small, indicating that the overall regression has explanatory power. Data w/ replications vs Data w/o replications - ANSWER-Data with replications: We can observe binary data for repeated trials. That is a binomial distribution with more than one trial or ni greater than 1 Data without replications: For each unique set of the observed predicting variables, we can observe binary data with no repeated trials. That is a binomial distribution with one trial where ni = 1 Logistic Regression with replications - ANSWER-Residuals: We can only define residuals for binary data with replications Goodness of Fit: We perform goodness of fit only for logistic regression with replications under the assumption that Yi is binomial with ni greater than 1 Pearson Residuals - ANSWER-This is the standardized difference between the ith observed response and estimated expected response, which is ni times the probability of success We need to standardize the difference between observed and expected response, as the responses have different variances Pearson residuals have an approximately standard normal distribution Deviance Residuals - ANSWER-These are the signed square root of the log-likelihood evaluated at the saturated model when we assume that the estimate expected response is the observed response versus the fitted model Deviance residuals have a standard normal distribution if the model is a good fit (i.e. model assumptions hold) Goodness of Fit - ANSWER-We can use the Pearson or Deviance residuals to evaluate whether they are normally distributed. If they're normally distributed, we conclude that the model is a good fit If the model is not a good fit, it means the linearity assumption may not hold Goodness of Fit Test - ANSWER-The null hypothesis is that the model fits well. The alternative is that the model does not fit well The test statistic for the goodness of fit test is the sum of squared deviances which has a Chi-Square distribution with n- p- 1 degrees of freedom If the p-value is small, we reject the null hypothesis of good fit, and thus we conclude that the model is not a good fit. We want LARGE values of P. Large values of P indicate that the model may be a good fit For goodness of fit test, we compare the likelihoods of the saturated model versus the fitted model. Goodness of Fit (binary data with no responses) - ANSWER-Use the deviances from the aggregated model for goodness of fit, not based on the individual level data Reasons why a model may not be a good fit - ANSWER-There may be other variables that should be included in the model The relationship between Logit of the expected probability and predictors might be multiplicative, rather than additive Departure from the linearity assumption Initial observations outliers, leverage points are also still an issue for this model Logit function does not fit well with the data The binomial distribution isn't appropriate. For example, if there's correlation among the responses or there's heterogeneity in the success that hasn't been modeled. Both of these violations can lead to what we call overdispersion Overdispersion - ANSWER-This is where the variability of the probability estimates is larger than would be implied by a binomial random variable ɸ = D/(n-p-1) D is the Deviance(sum of squared deviances) If ɸ > 2 then model is overdispersed; an over-dispersed model will fit better Overdispersion impacts the estimated variance and statistical inference. If overdispersion is not accounted for, statistical inference will not be as reliable Link Functions - ANSWER-C-log Function: This has very long tails, meaning that it works best in extremely skewed distributions Probit Function: This is the inverse of the CDF of a standard normal distribution. This fits data with least-heavy tails among the three S shaped functions. This would work well when the probabilities are all concentrated within a small range Logit Function: This is what is called the canonical link function, which means that parameter estimates under logistic regression are fully efficient and tests on those parameters are better behaved for small samples. The interpretations of regression coefficients in terms of log odds is possible with a logit function but not other S-shape functions Classification - ANSWER-Classification is prediction of binary responses. If the predicted probability is large, then classify y star as a success Classification Error Rate - ANSWER-Classification error rate is the probability that the new response is equal to the classifier(R) R is between 0 and 1. Most common value for R is 0.5 however a different R can be used to improve the prediction accuracy Training Error - ANSWER-This is the proportion of the responses that are misclassified We cannot use the training error rate as an estimate of the true error classification error rate because it is biased downward The bias comes from the fact that we use the data twice. One, we used it for fitting the model and the second time is to estimate the classification error rate Cross Validation - ANSWER-This is a direct measure of predictive power Random sampling is computationally more expensive than the K-fold cross validation, with no clear advantage in terms of the accuracy of the estimation classification error rate The rule of thumb for choosing K is about K = 10 LOOCV is a K-fold cross validation with K = n. The larger K is, the larger the number of folds, the less bias the estimate of the classification the error is but has higher variability. LOOCV - ANSWER-LOOCV can be approximated by the sum between the training risk + the complexity penalty. The complexity penalty is (2 * # of predictors in submodel * estimated_variance of submodel)/n The variability of the submodel is smaller than that of the full model, thus LOOCV penalizes complexity less than Mallow's Cp LOOCV is ~ AIC when the true variance is replaced by the estimate of the variance from the submodel Poisson Regression - ANSWER-The response Y in Poisson regression is assumed to have a Poisson distribution, and this is commonly used for modeling count or rate data. We assume that the i-th response Yi has a Poisson distribution, with rate lambda i. Alternatively, log of the rate lambda i is equal to the linear combination of the predicting variables We do not interpret beta with respect to the response variable but with respect to the ratio of the rate There is no error term Poisson Regression Assumptions - ANSWER-Linearity: The log transformation of the rate is a linear combination of the predicting variables. Independence: The response variables are independently obserterm-52ved Logit: The link function g is the log function. The log link function is almost always used Linearity Assumption - Poisson - ANSWER-Linearity can be evaluated by plotting the log of the event rate versus the predicting variables We can also evaluate linearity on the assumption of uncorrelated responses using the scatter plot for the residuals versus the predicting variables Generalized Linear Models (GLM) - ANSWER-Here, the response Y is assumed to have a distribution from the exponential family of distributions (Normal, Binomial, Poisson, Gamma etc) Under this model, we model a transformation g of the expectation of Y, given the predicting variables as a linear combination of the predicting variables We can write the expectation as the inverse of the g transformation of the linear combination of the predicting variables **Include table w/ link function & regression function pg 67 G transformation - ANSWER-The transformation g is called a link function since it links the expectation of the response to the predicting variables Poisson Regression vs Log transformed Linear Regression - ANSWER-Standard Regression: We estimate the expectation of the log of the response - E(log(Y)) The variance under the standard regression is assumed constant Poisson Regression: We estimate the log of the expectation of the response - log(E(Y)) The variance of the response is assumed to be equal to the expectation; thus, the variance is not constant. **Use the Poisson regression especially when the response data are small counts **Using the standard linear regression with log transformation instead of Poisson regression, will result in violations of the assumption of constant variance **Standard Linear Regression could be used if the number of counts are large and with the variance stabilizing transformation √(µ + 3/8) i.e. Square root of the response + 3/8. This transformation will work well for large count when the response data are large counts Log Rate - ANSWER-This is the log function of the expected value of the response ln(λ(x)) = β0 + β1x Regression Coefficient - ANSWER-The regression coefficient is interpreted as the log ratio of the rate with an increase with one unit in the predicting variable [Show Less]

Preview 3 out of 22 pages