ISYE 6501 FINAL EXAM WITH COMPLETE
SOLUTION 2022/2023
1. Factor Based Models: classification, clustering, regression. Implicitly assumed that we
... [Show More] have a lot of factors in the final model
2. Why limit number of factors in a model? 2 reasons: overfitting: when # of factors is close to or larger than # of data points. Model may fit too closely to random effects simplicity: simple models are usually better
3. Classical variable selection approaches: 1. Forward selection
2. Backwards elimination 3. Stepwise regression greedy algorithms
4. Backward elimination: variable selection; classical
Opposite of forward selection. Start with model with all factors, at each step find worst factor and remove from model. Continue until no more to add, # of factor threshold is satisfied. Remove factors at the end that were not good enough
5. Forward selection: variable selection; classical
Start with model with no factors, at each step find best new factor to add. Continue until none bad enough to remove, # of factor threshold is satisfied. Remove factors at the end that were not good enough
6. Stepwise regression: variable selection; classical
Combination of forward selection and backwards elimination. Start with all or no factors. Each step remove/add a factor. As it continues, after adding in new factor we eliminate right away any factors that may be good. Helps model adjust when new factors are added, goodness values change
7. Ways of determining if factors are good enough in variable selection: p-value, Rsquared, AIC, BIC
8. Greedy algorithm: At each step, it does the one thing that looks best without taking future options into consideration. Good for initial analysis
1. Forward selection
2. Backwards elimination
3. Stepwise regression
9. Global variable selection approaches: 1. LASSO
2. Elastic Net
Slower, but tend to give better predictive models
10. LASSO: variable selection; global
- SCALE the date (as with any constrained sum of coefficients)
- add a constraint to the standard regression equation
- minimize sum of squared errors
- T = limit or "budget" on how large the sum of squared errors can get. Budget will be used on most important coefficients
- Method for limiting the number of variables in a model by limiting the sum of all coefficients' absolute values. Can be very helpful when number of data points is less than number of factors.
11. Elastic Net: variable selection; global
- SCALE the date (as with any constrained sum of coefficients)
- T = limit or "budget" on how large the sum of squared errors can get. Budget will be used on most important coefficients
- Combination of lasso and ridge regression.
- Variable selection benefits of LASSO
- Predictive benefits of ridge regression
12. Ridge Regression: - Method of regularization by limiting the sum of the squares of the coefficients. Will reduce the magnitude of coefficients, not the number of variables chosen.
- The quadratic term in ridge regression tends to shrink the coefficient values i.e Whatever the basic regression model coefficients would be, the quadratic constraint pushes them toward zero or regularizes them.
13. Design of Experiments (DOE): How can we still have a representative sample of each combination of factors, while only surveying 600 people? How to determine which of the several factors are most important to predicting someone's answers?
comparison to measure difference control for other factors and effects blocking factors that account for the variation between factors (red sports car vs red minivan example)
14. A/B testing: Whenever we want to choose between 2 alternatives.
As long as the following 3 things are true:
1st, we need to be able to collect a lot of data quickly enough to get an answer in time to use it.
2nd, the data we collect has to be from a representative sample of the whole 3rd, the amount of data we collect has to be small compared to the total population we want to use the answer on.
Before modeling and before collecting data
15. (Full) Factorial Design: Test every combination of variables in an experiment to find each one's effect, and interaction effects on the outcome.
16. Fractional Factorial Design: A subset of combinations to test - selected combinations give same result as full factorial design i.e a balanced design
Before modeling and before collecting data
17. What approach to take if it is believed the factors we can change are independent? (Factorial design): Test a subset of combinations and use regression to estimate the effect of each choice
Before modeling and before collecting data
18. Exploration vs. Exploitation: Exploration - focusing on getting more information
Exploitation - getting immediate value
19. Multi-Armed Bandit Problem: Exploration/Exploitation principle
Several slot machines, not known which has the highest payout, so must test all
(K) alternatives
1st test = equal probability
2nd test = update probabilities based off of 1st test (we can also change: # of tests, how we update probabilities, change how we assign new tests)
20. What needs to be the case when matching data to a probability distribution to gain insight based on how the distribution is derived?: The only information we have about a data point is the response, or when it would be hard to collect and analyze additional information
21. What is the Bernoulli distribution is useful to model?: A single event. i.e flipping a coin, will it rain or not?, will I get this job offer or not?
Only really useful when you put many of them together (flip a coin 10,000 times)
22. Describe a Bernoulli distribution in terms of a coin toss test: Probability (p) that a single coin flip comes up heads and probability (1-p) that the coin comes up tails
23. Define a Binomial distribution: The probability of getting x yes answers out of n independent Bernoulli trials, each with the probability p
24. When is the normal distribution useful as an estimate for the Binomial distribution?: When n is large, and for modeling errors (predictive models)
25. What is the question to describe a Geometric distribution?: How many (Bernoulli) trials are needed before we get an answer of a certain type?
26. What is the Poisson distribution good at modeling?: Random arrivals of people to lines, queues etc
- The function gives the probability that x people do arrive given the average arrival rate (lambda)
- assumes arrivals are independent, and identically distributed (i.i.d)
27. What is the Exponential distribution good at modeling?: The time between arrivals or trials (inter-arrival time)
28. How are the Poisson and Exponential distributions related?: If arrivals are Poisson, with arrival rate lambda, then the time between arrivals (inter-arrival time) follows the exponential distribution (1/lambda = inter-arrival)
The same is true if inter-arrival time is exponential
29. When k = 1, the Weibull is what?: An Exponential distribution.
Whether it's a failure rate that is constant (Weibull) or an inter-arrival rate (Exponential)
30. What is the Weibull distribution useful to model?: The amount of time it takes for something to fail, specifically the time between failures.
31. Describe a Q-Q plot: Whatever variations in the data there might be and even if the number of data points in two sets is very different, 2 similar distributions should have about the same value at each quantile. Could also use to match to a probability distribution (just calculate theoretical values of quantiles following the distro)
32. When k < 1, the Weibull is good for modeling what?: When failure rate decreases with time.
Worst things fails first
33. What 2 probability distributions are memoryless?: Poisson and Exponential 34. When k > 1, the Weibull is good for modeling what?: When failure rate increases with time
Things that wear out [Show Less]