You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am using HDBSCAN to cluster text embeddings.
As the data is unbalanced in favor of one category of embeddings, I am obtaining too many sub-clusters of that category, which I would like to squash together. I have found that datapoints with a cosine distance <0.7 should belong in the same cluster, and if I understand correctly I should set cluster_selection_epsilon=0.7 to achieve this outcome.
This doesn't seem to be working as all the datapoints and up in the same cluster (the value is too high?).
Hi, I am using HDBSCAN to cluster text embeddings.
As the data is unbalanced in favor of one category of embeddings, I am obtaining too many sub-clusters of that category, which I would like to squash together. I have found that datapoints with a cosine distance <0.7 should belong in the same cluster, and if I understand correctly I should set
cluster_selection_epsilon=0.7
to achieve this outcome.This doesn't seem to be working as all the datapoints and up in the same cluster (the value is too high?).
My current code:
cluster_labels:
cosine_dist:
Is this the correct use of
cluster_selection_epsilon
? ThanksThe text was updated successfully, but these errors were encountered: