Skip to content
This repository has been archived by the owner on Feb 18, 2020. It is now read-only.

k Means Clustering

Daniel Patrick Foose edited this page Jan 17, 2017 · 5 revisions

Distance Metrics

Vespucci supports the following distance metrics for k-means clustering

Initialization Methods

Vespucci supports the following initialization methods, which are used to select the initial centroids for the k-means algorithm.

Method Description
Sample Initialization (Forgy) Select k spectra at random to serve as initial centroids
Random Partition Assign each spectrum to a random cluster, then use centroids of random clusters as initial centroids
Refined Start (Bradley-Fayyad) Perform k-Means on smaller, random subsamples of data, use centroids of subsample k-means as initial centroids

Under the Hood

Vespucci uses mlpack's k-means implementation via the KMeansWrapper class in the Vespucci library.

Further Reading