greedy algorithm
at each step, the algorithm does the thing that looks best without taking future options into consideration; more
... [Show More] classical
variable selection methods
stepwise - (forward, backward, combination)
lasso
elastic net
available metrics for variable selection criteria
p-value
r2
AIC / BIC
lasso
Giving regression a budget to use on coefficients which it uses on most important coefficients
Have to scale first
elastic net
constrain combination of absolute value of coefficients and their squares
have to scale first
Note: if absolute value is removed, ridge regression (to be covered later)
prediction error
function of bias and variance
model complexity on x axis
error on y axis
bias squared decreasing, variance increasing, prediction error concave curve with inflection point where (ish) bias squared = variance
design of experiment
used as a means to quickly and efficiently get a subset of data
Ex: polls, which similar products a retailer should display
-keep in mind comparison / control and blocking
blocking factor
something that could create variance in DOE
A/B Testing
experiment design to test two alternatives; binomial data used to answer (with hypothesis test) which is better choice
Experiment can be run n times or until results are significant
A/B testing requirements
Quick collection of a lot of data
Data must be representative
Data collection size must be small compared to total size of use case
A/B testing limitations
Does not address: Several alternatives, Learning as you go, Combination of factors
factorial design
experiment design with more than two alternatives, but a small enough set of scenarios to test all
testing combinations
ANOVA determines importance of each factor
fractional factorial design
experiment design with too many possible scenarios to test comprehensively where a subset of scenarios is tested
independent factorial design
experiment design where factors are assumed to be independent; a subset of combinations of choices is tested, and regression is used to estimate effect of each choice; each factor gets a categorical variable
Note that interaction terms are likely necessary; ex: font color and background color
exploration v exploitation
more information v immediate value -- with regards to DOE
multi armed bandit approach
testing k alternatives & starting with no information; run test and update information about probabilities of each being the best, assign new test Exploration, but since more likely to pick the best one, also exploitation [Show Less]