You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for providing this amazing library. I learnt a lot from your implementation. Among all clustering scheme I've tried, this is one of the best so far.
I spent a while to study your amazing Cython implementation and the idea behind. Minimum spanning tree is really amazingly informative. However, in practical use, I am always feeling there is a problem on the automatic cluster selection. It is just too sensitive to the selection of fall_off_size(min_sample). Besides, when two obvious distinguish clusters are connected by very few noisy points in between, it is more likely they would be put together. I understand mutual distance is used to address that, but automatic selection based on stability seems to bring the quality down.
Before the automatic selection I think everything is perfect, you do have a minimum spanning tree to cut, a single linkage tree to do any migration, split or merge. I do feel there would be some room to improve or even replace the condense tree for automatic selection of cluster.
The text was updated successfully, but these errors were encountered:
Thanks for the compliments on the implementation. I admit that the automatic cluster selection may not be to everyone's taste, but it is a good default for a large number of cases. Since the single_linkage_tree_ and condensed_tree_ are both exposed as attributes of the model after fitting I feel those who wish to do something else are able to should they desire to do so.
On the other hand, if the question is one of sensitivity to the min_samples parameter (rather than min_cluster_size) I may have some answers there. I have been working on a different algorithms that essentially operates over all (or potentially just many) min_samples values and computes a total stability over the combined epsilon and min_samples space. This requires some significant rethinking of how to interpret the algorithm, and I've been drawing heavily from persistent homology (and more accurately persistent homotopy) theory to get something workable. There are still a number of details to hammer out and some work to be done to ensure the resulting algorithm really does return useful clusterings, but I believe it has significant promise.
Thanks for providing this amazing library. I learnt a lot from your implementation. Among all clustering scheme I've tried, this is one of the best so far.
I spent a while to study your amazing
Cython
implementation and the idea behind. Minimum spanning tree is really amazingly informative. However, in practical use, I am always feeling there is a problem on the automatic cluster selection. It is just too sensitive to the selection of fall_off_size(min_sample). Besides, when two obvious distinguish clusters are connected by very few noisy points in between, it is more likely they would be put together. I understandmutual distance
is used to address that, but automatic selection based onstability
seems to bring the quality down.Before the automatic selection I think everything is perfect, you do have a minimum spanning tree to cut, a single linkage tree to do any migration, split or merge. I do feel there would be some room to improve or even replace the
condense tree
for automatic selection of cluster.The text was updated successfully, but these errors were encountered: