The goal of machine learning is to:
design general purpose
methodologies to extract valuable patterns from data, ideally without much domain-specific
... [Show More] expertise
The goal of learning is to:
find a model and its corresponding parameters such that the resulting predictor will perform well on unseen data
Brainpower
Read More
Previous
Play
Next
Rewind 10 seconds
Move forward 10 seconds
Unmute
0:10
/
0:15
Full screen
Learning can be understood as:
a way to automatically find patterns and structure in data by optimizing the parameters of the model
Good models can also be thought of as:
simplified versions of the real (unknown) data-generating process, capturing aspects that are relevant for modeling the data and extracting hidden patterns from it
Training a model means:
to use the data available to optimize some parameters of the model with respect to a utility function that evaluates how well the model predicts the training data
The six main foundational subjects of machine learning are:
1. Analytic Geometry
2. Linear Algebra
3. Matrix Decomposition
4. Optimization
5. Probability & Distributions
6. Vector Calculus
The motivation for analytic geometry is:
Given two vectors representing two objects in the real world, we want to make statements about their similarity. The idea is that vectors that are similar should be predicted to have similar outputs by our machine learning algorithm (our predictor). To formalize the idea of similarity between vectors, we need to introduce operations that take two vectors as input and return a numerical value representing their similarity
train machine learning models, where we typically find parameters that maximize some performance measure. Many optimization techniques require the concept of a gradient, which tells us the direction in which to search for a solution.
Vector calculus and optimization work together in machine learning to:
The key objective of dimensionality reduction is to:
find a compact, lower-dimensional representation of high-dimensional data, which is often easier to analyze than the original data
The objective of density estimation is to:
find a probability distribution that describes a given dataset
Hadamard Product
an element-wise multiplication operation on matrix elements (matrices must be same dimensions)
The two major approaches to constructing predictor models are:
1. predictor as a function,
2. predictor as a probabilistic model
The three distinct algorithmic phases of machine learning are:
1. Training / parameter estimation
2. Hyperparameter tuning / model selection
3. Prediction / inference
One way to think of the distinction between parameters and hyperparameters is:
Parameters can be numerically optimized, while hyperparameters need to be selected using search techniques
The equation for average loss in empirical risk minimization is:
The matrix-form of the least-squares equation for minimizing average loss is:
Introducing a penalty term for regularization can be thought of as:
biasing the search for the minimum of the loss function, which makes it harder for the optimizer to return an overly flexible predictor
What is one potential disadvantage of K-fold cross validation?
the computational cost of training the model K times can be burdensome if the training cost is computationally expensive. This can be mitigated by parallelizing the cross-validation procedure.
Generalized linear models are defined as:
The class of models which have linear dependence between parameters and data, and have potentially nonlinear transformation 𝜑 (called a link function)
Probabilistic models are specified by:
the joint distribution of al ltheir random variables.
Nested Cross Validation
For each first-level train / test split, we perform another round of cross-validation where the inner level is used to estimate the performance of a particular choice of model or hyperparameter.
Occam's razor for model selection implies:
the objective of model selection is to find the simplest model that explains the data reasonably well, since we assume that simpler models are less prone to overfitting
Describe the Bayesian interpretation of probability:
The Bayesian interpretation uses probability to specify the degree of uncertainty that the user has about [Show Less]