GitHub - teddyroland/python-hartigan: Unsupervised method to determine "correct" number of K-Means in Python

PYTHON-HARTIGAN is a python script that provides an unsupervised method to determine an optimal value K in K-Means clustering. The script is an implementation of Hartigan's Rule -- essentially, it measures the change in goodness-of-fit as the number of clusters increases. This implementation is based on Chiang & Mirkin (2009), §3.1(B). Feel free to change the script in any way; it is commented to guide any edits you may need to make.

If you want to build your intuition for Hartigan's Rule, I suggest setting the 'threshold' parameter to 0 or a small positive value and including a line in the while-loop that will print 'H_Rule' after each iteration. I also highly recommend looking over Chiang & Mirkin's article which notes some situations in which other unsupervised methods perform better.

PYTHON-HARTIGAN relies on two standard scientific Python packages: numpy & scikit-learn.

A similar function in R is FitKMeans() in the 'useful' package, which also implements Hartigan's Rule.

Chiang & Mirkin (2009): http://www.dcs.bbk.ac.uk/~mirkin/papers/00357_07-216RR_mirkin.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md
hartigan.py		hartigan.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

teddyroland/python-hartigan

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages