Building simpler models with fewer factors helps avoid which problems?
A. Overfitting
B. Low prediction quality
C. Bias in the most important
... [Show More] factors
D. Difficulty in interpretation
A. Overfitting
D. Difficulty of interpretation
Two main reasons to limit # of factors in a model.
1. Overfitting
2. Simplicity
When is overfitting likely to happen?
When the number of factors is close to the number of data points.
How does using a # of factors that is close to the number of data points cause overfitting?
The model too closely fits the random efffects. It fits that data set well, but fails to predict well on a new data set.
Three reasons simple models are better than complex models.
1. Less data is required
2. Less chance of including insignificant factors
3. Easier to interpret
Which of these is a key difference between stepwise regression and lasso regression?
A. Lasso regression requires the data to first be scaled
B. Stepwise regression gives many models to choose from, while lasso gives just one.
A. If the data isn't scaled, the coefficients can have artificially different orders of magnitude, which means they'll have unbalanced effects on the lasso constraint.
Name three greedy approaches to variable selection.
1. Forward Selection
2. Backward Elimination
3. Stepwise Regression
Name two global approaches to variable selection.
1. Lasso
2. Elastic Net
What does a greedy approach to variable selection mean?
At each step, the model does one thing that looks best without taking future options into consideration. A more classical approach.
How does forward selection differ from backward elimination?
Forward selection starts with zero factors and backward elimination starts with all factors.
What is stepwise regression?
A combination of forward selection and backward elimination. At each step, a variable is considered for addition or elimination based on some prespecified criterion.
Name two approaches to use with stepwise regression.
1. Start with all factors
2. Start with no factors
What restriction does the lasso approach add?
The sum of the coefficients can't be too large. "t"
How does lasso use the t value?
It uses that on the most important coefficients and the others will be zero so those factors won't be part of the model.
What do you need to with the data whenever you're constraining the sum of coefficients?
Scale the data.
The value of t in the lasso approach depends on what two things?
1. The number of variables you want.
2. The quality of the model as you allow more variables
What is the best approach to find the best value of t with lasso?
Try lasso with different values of t and pick the value that has the best tradeoff between number of variables and quality of the model.
What constraint does Elastic Net add?
Elastic Net constrains the absolute value of the coefficients and their squares.
Name two things similar about Lasso and Elastic Net.
1. You have to scale the data for both.
2. You have to pick the best value of t for both.
Do you have to scale the data for Elastic Net?
Yes
What change can you make to elastic net to get a model called ridge regression?
Remove the absolute value term.
Does ridge regression do variable selection?
No.
What is the advantage of ridge regression?
It can lead to better predictive models.
When two predictors are highly correlated, which of the following statements is true?
A. Lasso regression will usually have non-zero coefficients for both predictors.
B. Ridge regression will usually have non-zero coefficients for both predictors.
B. Ridge regression will choose smaller (in an absolute sense) non-zero coefficients for both models. By nature, it may underestimate the effect of the factors.
Which variable selection methods are good for initial analysis?
Forward selection
Backward elimination
Stepwise regression
What is one of the drawbacks to forward selection, backward elimination and stepwise regression?
They can give you a set of variables that fit more to random effects than you would like and it appears you have a better fit than you do.
What is the advantage to Lasso and elastic net?
They tend to give better predictive models, but are slower to compute.
Which methods are combined to make up Elastic Net?
Lasso and Ridge regression.
What does the absolute value factor in Lasso do?
Helps decrease the number of factors with non zero coefficients.
What does the quadratic term in ridge regression do?
Shrink the coefficient values.
It pushes them toward zero, or regularizes them (it's a shrinkage method)
What is the difference between the quadratic term in Ridge regression and the absolute value term in Lasso?
Lasso's absolute value term makes some coefficients equal zero. Ridge regressions quadratic term shrinks the coefficients, but they won't equal z [Show Less]