Skip to content
Home ยป unsupervised learning

unsupervised learning

Optimizing the number of clusters using Tibshirani’s gap statistic

  • by
  • 5 min read

When you are clustering, what you are actually trying to do is to find groups of objects so that they are similar to one another, and different from the object of other groups. In other words, you want to minimize the intra-cluster distance and maximize the inter-cluster distance. Clustering algorithms… 

Storing a K-means model in R

  • by
  • 2 min read

K-means clustering is quick and dirty and generally provides some interesting results. However, the default kmeans function in R lacks features, such as actually storing the model to use the centroids for prediction purposes on unseen data. That’s where flexclust comes in. Flexclust is a package that is designed around…