What do descriptive questions ask? - CORRECT ANSWER What happened? (e.g., which customers are most alike)
What do predictive questions ask? -
... [Show More] CORRECT ANSWER What will happen? (e.g., what will Google's stock price be?)
What do prescriptive questions ask? - CORRECT ANSWER What action(s) would be best? (e.g., where to put traffic lights)
What is a model? - CORRECT ANSWER Real-life situation expressed as math.
What do classifiers help you do? - CORRECT ANSWER differentiate
What is a soft classifier and when is it used? - CORRECT ANSWER In some cases, there won't be a line that separates all of the labeled examples. So we use a classifier that minimizes the number of mistakes.
What does it mean when the classifier/decision boundary is almost parallel to the vertical x-axis? - CORRECT ANSWER The horizontal attribute is all that is needed.
What does it mean when the classifier/decision boundary is almost parallel to the horizontal y-axis? - CORRECT ANSWER The vertical attribute is all that is needed.
What is time-series data? - CORRECT ANSWER The same data recorded over time often recorded at equal intervals
What is quantitative data? - CORRECT ANSWER Number with a meaning: higher means more, lower means less (e.g., age, sales, temperature, income)
What is categorical data? - CORRECT ANSWER Numbers w/o meaning (e.g., zip codes), non-numeric (e.g., hair color), binary data (e.g., male/female, yes/no, on/off)
Which of these is time series data?
A. The average cost of a house in the United States every year since 1820
B. The height of each professional basketball player in the NBA at the start of the season - CORRECT ANSWER A
Which of these is structured data?
A. The contents of a person's Twitter feed
B. The amount of money in a person's bank account - CORRECT ANSWER B
What is structured data? - CORRECT ANSWER Data that can be stores in a structured way
What is unstructured data? - CORRECT ANSWER Data that is not easily described and stored (e.g., written text)
A survey of 25 people recorded each person's family size and type of car. Which of these is a data point?
A. The 14th person's family size and car type
B. The 14th person's family size
C.The car type of each person - CORRECT ANSWER A.
A data point is all the information about one observation
The farther the wrongly classified point is from the line ___ - CORRECT ANSWER The bigger the mistake we've made
The term including the margin gets larger so the importance of a large margin out weights avoiding mistakes and classifying known data samples. - CORRECT ANSWER As lambda gets larger
That term also drops towards zero, so the importance of minimizing mistakes and classifying known data points outweighs having a large margin. - CORRECT ANSWER As lambda drops towards zero
What can SVMs be used for - CORRECT ANSWER to find a classifier with maximum seperation or margin between the two sets of points?
When to use SVM? - CORRECT ANSWER If it's impossible to avoid classification errors, SVM can find a classifier that trades off reducing errors and enlarging the margin.
Error for data point j - CORRECT ANSWER What does this formula describe?
Total error - CORRECT ANSWER What does this formula describe ?
To maximize the distance between the two lines what do we need to minimize? - CORRECT ANSWER
m_j > 1 - CORRECT ANSWER What value do we give for more costly errors
Giving a bad loan is twice as costly as withholding a good loan? - CORRECT ANSWER What does this mean in the context of giving a loan?
m_j < 1 - CORRECT ANSWER What value do we give for less costly errors?
Why is it important to scale our data when using SVM? - CORRECT ANSWER We're looking to minimize the sum of the squares of the coefficients, but if our data has very different scales a small change in one could swamp a huge change in the other.
what does it signify when a coefficient for a classifier is close to zero - CORRECT ANSWER it means the corresponding attribute is probably not relevant
What do kernel methods allow for in SVMs - CORRECT ANSWER nonlinear classifiers
What is the common range for scaled data? - CORRECT ANSWER between 0 and 1
What is the formula for min-max scaling? - CORRECT ANSWER find min and max for a factor
what is common standardization and its formula? - CORRECT ANSWER scaling to a normal distribution with a mean of 0 and standard deviation of 1.
what is the formula for general scaling between b and a - CORRECT ANSWER
When do you use scaling? - CORRECT ANSWER Data in a bounded range (e.g., neural networks, RGB values, SAT scores, batting averages)
When do you use standardization? - CORRECT ANSWER PCA or clustering
When is KNN used? - CORRECT ANSWER Used for solving classification problems in which there are more than two classes.
How do you deal with attributes that might be more important than others in KNN? - CORRECT ANSWER You weight each dimension's distance different. The larger the weight the higher the impact.
A large value of K will lead to - CORRECT ANSWER a large variance in predictios
Setting a large value of k will ... - CORRECT ANSWER lead to a large model bias.
What are real effects? - CORRECT ANSWER Real relationships between attributes and responses. They are the same in all data sets,
What are random effects? - CORRECT ANSWER They are random but look like real effects. They are different in all data sets.
Why can't we measure a model's effectiveness on data it was trained on? - CORRECT ANSWER The model's performance on its training data is usually too optimistic, the model is fit to both real and random pattenrs in the data, so it becomes overly specialized to the specific randomness in the training set, that doesn't exist in other data.
If we use the same data to fit a model as we do to estimate how good it is, what is likely to happen? - CORRECT ANSWER The model will appear to be better than it really is.
The model will be fit to both real and random patterns in the data. The model's effectiveness on this data set will include both types of patterns, but its true effectiveness on other data sets (with different random patterns) will only include the real patterns
When comparing models, if we use the same data to pick the best model as we do to estimate how good the best one is, what is likely to happen? - CORRECT ANSWER The model will appear to be better than it really is.
The model with the highest measured performance is likely to be both good and lucky in its fit to random patterns.
What is a training set used for - CORRECT ANSWER used to fit the models
What is a validation set used for? - CORRECT ANSWER used to choose best model
Why would we use two sets? - CORRECT ANSWER Reason to use two different sets is because if the first set, the training set, had unique random effects that the classifer was designed for, we wouldn't be counting those benefits when we measure effectiveness on the validation set.
What effects does randomness have on training /validation performance? - CORRECT ANSWER sometimes the randomness will make the performance look worse than it really is, and sometimes the randomness will make the performance look better than it really is
how are high-performing models affected by randomness? - CORRECT ANSWER They are often boosted by above average random effects making it look better
what is a test data set used for? - CORRECT ANSWER to estimate performance of chosen model
When do we need a validation set? - CORRECT ANSWER When we are choosing between multiple models.
What are the data splits when working with one model? - CORRECT ANSWER 70-90% training, 10-30% test
What are the data splits when comparing models? - CORRECT ANSWER 50-70% training, split the rest between validation and test
What are two methods of splitting data? - CORRECT ANSWER random and roation
What is the rotation method of splitting data? - CORRECT ANSWER You take turns selecting points.
5 data point rotation sequence: (Training - Validation - Training - Test - Training
What is the advantage of rotation over randomness? - CORRECT ANSWER We make sure each part of the data is equally separated.
What is the disadvantage of using rotation? - CORRECT ANSWER We have to make sure we aren't creating some other type of bias when we assign points.
what is k-fold cross validation? - CORRECT ANSWER split the training/validation data into k-parts; we train on k-1 parts and validate on the remaining part.
What metric do you use for k-fold cross validation when comparing models? - CORRECT ANSWER The average of all k evaluations.
What do we use when important data only appears in the validation or test sets? - CORRECT ANSWER cross-validation
What do we do after we've performed cross-validation? - CORRECT ANSWER We train the model again using all the data.
what are the benefits of k-fold cross validation? - CORRECT ANSWER better use of data, better estimate of model quality, and chooses model more effectively
What can clustering be used for? - CORRECT ANSWER grouping data points (e.g., market segmentation) and discovering groups in data points (e.g., personalized medicine
Which should we use most of the data for: training, validation, or test? - CORRECT ANSWER training
In k-fold cross-validation, how many times is each part of the data used for training, and for validation? - CORRECT ANSWER k-1 times for training, and 1 time for validation
what is rectangular distance useful for? - CORRECT ANSWER calculating driving distance when the city is mapped in a grid
what is the value of p for euclidean distance - CORRECT ANSWER 2
what is the general equation for p-norm distance - CORRECT ANSWER
2-norm - CORRECT ANSWER Straight-line distance corresponds to which distance metric?
How do you find the distance of an infinity norm? - CORRECT ANSWER You find the largest | x_i - y_i |
What is a centroid - CORRECT ANSWER the center of a cluster
What are the steps of k means? - CORRECT ANSWER 0. Pick k clusters within range of data.
1. Assign each data point to nearest cluster center
2. Recalculate cluster centers (centroids)
3. Repeat 1 and 2 until no changes
How do we find the cluster centers? - CORRECT ANSWER We take the mean of all the data points in cluster.
Why is k-means an expectation-maximization - CORRECT ANSWER finding the mean of all the points in cluster is similar to finding an expectation.
Assigning data points to cluster centers is the maximization step. Really we are minimizing, but we could think of it as maximizing the negative of the distance to a cluster center
What are some of the consequences of outliers in k-means? - CORRECT ANSWER It will drag the cluster center artificially to one side. [Show Less]