unsupervised learning

Optimizing the number of clusters using Tibshirani’s gap statistic

by roelpi
October 17, 2019August 12, 2020
5 min read

When you are clustering, what you are actually trying to do is to find groups of objects so that they are similar to one another, and different from the object of other groups. In other words, you want to minimize the intra-cluster distance and maximize the inter-cluster distance. Clustering algorithms…

by roelpi
August 23, 2019April 5, 2021
2 min read

K-means clustering is quick and dirty and generally provides some interesting results. However, the default kmeans function in R lacks features, such as actually storing the model to use the centroids for prediction purposes on unseen data. That’s where flexclust comes in. Flexclust is a package that is designed around…

unsupervised learning

Optimizing the number of clusters using Tibshirani’s gap statistic

Storing a K-means model in R