What is supervised learning?
A type of machine learning which is used when we want to predict a certain output from a given input and have examples of
... [Show More] input/out pairs. The training set we feed into the algorithm includes the desired solutions (labels).
labelled data: each instance comes with the expected output
What are the two types of supervised machine learning?
(1) classification
- binary classification: 2 classes
- multiclass classification: classification between more than 2 classes
(2) regression
Examples of supervised machine learning?
(1) KNN
(2) logistics/ linear regression
(3) Support Vector Machine (SVMs)
(4) Decision Tree and Random Forests
(5) Neural Networks
What is unsupervised learning?
The training set is unlabeled. The algorithm tries to learn without a teacher.
Examples of unsupervised learning
- Clustering:
K-Means
DBSCAN
Hierarchical Cluster Analysis (HCA)
- Anomaly detection and novelty detection:
One-class SVM
Isolation Forest
- Visualization and dimensionality reduction
Principal Component Analysis (PCA)
Kernel PCA
Locally Linear Embedding (LLE)
t-Distributed Stochastic Neighbor Embedding (t-SNE)
- Association rule learning
Apriori
Eclat
Dimensionality Reduction
Goal: Simplify the data without losing too much information
How?: Merge several correlated features into one.
For example, a car's mileage may be strongly correlated with its age.
Anomaly Detection v.s. Novelty Detection
Novelty Detection: it aims to detect new instances that look different from all instances in the training set.
Anomaly Detection: it aims to detect new instances that look different from all instances in the training set.
For example, if you have thousands of pictures of dogs, and 1% of these pictures represent Chihuahuas, then a novelty detection algorithm should not treat new pictures of Chihuahuas as novelties. On the other hand, anomaly detection algorithms may consider these dogs as so rare and so different from other dogs that they would likely classify them as anomalies (no offense to Chihuahuas).
semisupervised learning
deal with data that's partially labeled
Feature scaling
A type of transformation.
Why? The machine learning algorithm does not perform well when input numerical attributes have very different scales
(a) max-min scaling (normalization): range from 0 to 1
how? by subtracting a min value and dividing by the
max - the min
(b) Standardization: By subtracting mean value and dividing by S.D.
Pros: less affected by outliers
Cons: values do not bound to a certain range-->
may be a problem for some algorithm (ex: neural [Show Less]