k Means Clustering

Distance Metrics

Vespucci supports the following distance metrics for k-means clustering

Vespucci supports the following initialization methods, which are used to select the initial centroids for the k-means algorithm.

Method	Description
Sample Initialization (Forgy)	Select k spectra at random to serve as initial centroids
Random Partition	Assign each spectrum to a random cluster, then use centroids of random clusters as initial centroids
Refined Start (Bradley-Fayyad)	Perform k-Means on smaller, random subsamples of data, use centroids of subsample k-means as initial centroids

Vespucci uses mlpack's k-means implementation via the KMeansWrapper class in the Vespucci library.