cluster selection based on stability is really sensitive to the selection of fall_off_size #43

chongxi · 2016-07-07T17:44:25Z

Thanks for providing this amazing library. I learnt a lot from your implementation. Among all clustering scheme I've tried, this is one of the best so far.

I spent a while to study your amazing Cython implementation and the idea behind. Minimum spanning tree is really amazingly informative. However, in practical use, I am always feeling there is a problem on the automatic cluster selection. It is just too sensitive to the selection of fall_off_size(min_sample). Besides, when two obvious distinguish clusters are connected by very few noisy points in between, it is more likely they would be put together. I understand mutual distance is used to address that, but automatic selection based on stability seems to bring the quality down.

Before the automatic selection I think everything is perfect, you do have a minimum spanning tree to cut, a single linkage tree to do any migration, split or merge. I do feel there would be some room to improve or even replace the condense tree for automatic selection of cluster.

The text was updated successfully, but these errors were encountered:

lmcinnes · 2016-07-08T01:12:41Z

Thanks for the compliments on the implementation. I admit that the automatic cluster selection may not be to everyone's taste, but it is a good default for a large number of cases. Since the single_linkage_tree_ and condensed_tree_ are both exposed as attributes of the model after fitting I feel those who wish to do something else are able to should they desire to do so.

On the other hand, if the question is one of sensitivity to the min_samples parameter (rather than min_cluster_size) I may have some answers there. I have been working on a different algorithms that essentially operates over all (or potentially just many) min_samples values and computes a total stability over the combined epsilon and min_samples space. This requires some significant rethinking of how to interpret the algorithm, and I've been drawing heavily from persistent homology (and more accurately persistent homotopy) theory to get something workable. There are still a number of details to hammer out and some work to be done to ensure the resulting algorithm really does return useful clusterings, but I believe it has significant promise.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster selection based on stability is really sensitive to the selection of fall_off_size #43

cluster selection based on stability is really sensitive to the selection of fall_off_size #43

chongxi commented Jul 7, 2016 •

edited

Loading

lmcinnes commented Jul 8, 2016

cluster selection based on stability is really sensitive to the selection of fall_off_size #43

cluster selection based on stability is really sensitive to the selection of fall_off_size #43

Comments

chongxi commented Jul 7, 2016 • edited Loading

lmcinnes commented Jul 8, 2016

chongxi commented Jul 7, 2016 •

edited

Loading