K means

less than 1 minute read

Updated:

18. K means

Clustering

  • image
  • Can we divide the dataset of dogs
  • Male female, breed, colours, etc clusters
  • So no single answer of how many clusters there are
  • Bc its unsupervised, no one prelabelled and we are looking for patterns to see if certain datapoints are close enough to belong together

Scale

  • Bc thereโ€™s no answer of how many clusters, with the same dataset, there could be diff number of clusters
  • image
  • Interpretation is gonna depend

k-means

  1. Choose the number of clusters (k=3 in this example)
  2. Place k centroids randomly (the triangles)
  3. Assign each point in the dataset to the closest cluster center
  4. compute the new centroids for the clusters
    1. move the cluster to the centers of the group!!!
    2. Centroid becomes the mean of the points

Screenshot 2020-02-25 at 8 45 19 pm

Screenshot 2020-02-25 at 8 45 51 pm โ†’ another step to compute the best for bottom right points

Pros

  • Very easy and simple to implement

Drawback

  • You have to choose the number of clusters. Gotta try diff things
  • It is a local algorithm โ€“ it converges to local minimum so initial placement of clusters will determine where the clusters will end up
    • Clusters may collapse
    • Or it may scatter in a strange way

Leave a comment