Sort-means appears to have a bug #14

ghamerly · 2018-07-10T16:16:10Z

Sort-means is giving slightly different results in some of my testing, compared to all the other algorithms. I have a test case where every other algorithm converges in 45 iterations, but it takes 42, and gives slightly different results (centers).

Steps to reproduce:

run naive k-means and sort-means on the attached input, with k=10, using the library's k-means++ initialization, with a max iterations of 1000
expected result: same number of iterations and centers
actual result: different centers, different iterations

The initial centers chosen by k-means++ on this dataset are:

    -0.379863       1.80735      0.658205      0.236417       1.52649 
    -0.456454      -0.44091      0.932621        1.4306      -0.77737 
      1.10611      -1.30305     -0.202388      -1.29549      0.150969 
     0.951702      0.885737    -0.0140388     -0.295967      -2.10619 
      2.52547      0.803819       1.57157      0.481766      0.168094 
     -0.35617      0.423922       0.72442     -0.557642      0.466705 
     0.619142      -1.08521      -1.85976      -1.19759     -0.763672 
    -0.929235      0.901898     -0.717973      -2.60481     0.0617718 
    -0.391341      0.258408     -0.368092       0.53643      -1.35026 
     0.242555       -1.5091   -0.00501431       0.30317         1.244

bad_input.txt.gz

The text was updated successfully, but these errors were encountered:

cur33 · 2018-08-19T20:29:30Z

I've started looking at this; so far, I've replicated these results, but nothing strikes me as obviously wrong about the implementation. I'll run some more detailed tests to see if I can find anything.

ghamerly · 2018-08-19T20:33:51Z

I would probably tackle this by printing out detailed information about each iteration (both for sort-means and for the naive algorithm), identifying the earliest iteration where something goes wrong (some assignment is incorrect), and drilling into why that happened.

cur33 · 2018-08-19T20:47:21Z

Will do!

ghamerly added the bug Something isn't working label Jul 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sort-means appears to have a bug #14

Sort-means appears to have a bug #14

ghamerly commented Jul 10, 2018

cur33 commented Aug 19, 2018 •

edited

Loading

ghamerly commented Aug 19, 2018

cur33 commented Aug 19, 2018

Sort-means appears to have a bug #14

Sort-means appears to have a bug #14

Comments

ghamerly commented Jul 10, 2018

cur33 commented Aug 19, 2018 • edited Loading

ghamerly commented Aug 19, 2018

cur33 commented Aug 19, 2018

cur33 commented Aug 19, 2018 •

edited

Loading