K means
Updated:
18. K means
Clustering
- Can we divide the dataset of dogs
- Male female, breed, colours, etc clusters
- So no single answer of how many clusters there are
- Bc its unsupervised, no one prelabelled and we are looking for patterns to see if certain datapoints are close enough to belong together
Scale
- Bc thereโs no answer of how many clusters, with the same dataset, there could be diff number of clusters
- Interpretation is gonna depend
k-means
- Choose the number of clusters (k=3 in this example)
- Place k centroids randomly (the triangles)
- Assign each point in the dataset to the closest cluster center
- compute the new centroids for the clusters
- move the cluster to the centers of the group!!!
- Centroid becomes the mean of the points
โ another step to compute the best for bottom right points
Pros
- Very easy and simple to implement
Drawback
- You have to choose the number of clusters. Gotta try diff things
- It is a local algorithm โ it converges to local minimum so initial
placement of clusters will determine where the clusters will end up
- Clusters may collapse
- Or it may scatter in a strange way
Leave a comment