14.1 Unsupervised Learning
Unsupervised Learning
- So far we have been doing supervised learning, where have a target we’re trying to predict.
- “How much will these homes sell for?”
- “How long will this person spend watching this video?”
- Unsupervised learning works when we don’t have an exact target to predict, or we want to explore relationships in the data.
- Clustering is one very common type of unsupervised learning.
Clustering
Goal: put observations into groups
- Those in the same group should be similar to each other
- Those in different groups should be different.
Crucial questions:
- How many groups?
- How do we define “similar” / “different”?
Some differences between clustering algorithms
- Do we need to specify number of clusters?
- Do clusters have to have specific shapes?
- Does every observation have to be in a cluster (or can there be “outliers”)?
- Does every point have to be in exactly one cluster (or can there be “fuzzy” clusters)? (“hard” vs “soft” clustering)
- How fast is it? (does it scale to large datasets?)
Impact of distance metric
- What if two items are close in one dimension, but far in another?