You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I experienced a hard Python crash on some real-life data. It turns out it's a call to .minimum_spanning_tree_.plot() when many points are exactly equal. A minimal example to reproduce:
import numpy as np
import hdbscan
model = hdbscan.HDBSCAN(gen_min_span_tree=True)
data = np.zeros((91, 3))
clustering = model.fit(data)
clustering.minimum_spanning_tree_.plot()
Note that it also happens when only a relative small proportion of points are equal (but only sometimes?), this is just the easiest way to show it.
I looked into this and it appears to be a problem in sklearn.manifold._t_sne._barnes_hut_tsne.gradient(),not (always?) being able to handle nans. For example:
I experienced a hard Python crash on some real-life data. It turns out it's a call to
.minimum_spanning_tree_.plot()
when many points are exactly equal. A minimal example to reproduce:Note that it also happens when only a relative small proportion of points are equal (but only sometimes?), this is just the easiest way to show it.
I looked into this and it appears to be a problem in
sklearn.manifold._t_sne._barnes_hut_tsne.gradient()
,not (always?) being able to handlenan
s. For example:One layer deeper, the crash occurs inside
sklearn.neighbors._quad_tree._QuadTree.build_tree()
:This output of this (due to
verbose=11
) up to the crash is:I didn't dig into the QuadTree code.
I'm unsure whether this is a bug in
hdbscan
,scikit-learn
, Cython or CPython...?The text was updated successfully, but these errors were encountered: