Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different exemplars from same clusters with Numpy 2 on different platforms #655

Open
changhsinlee opened this issue Sep 4, 2024 · 0 comments

Comments

@changhsinlee
Copy link

What

I found that when I upgrade from numpy 1 to 2, the clustering results are different on different platforms. This behavior didn't happen on numpy 1. I also tested setting numpy seeds and PYTHONHASHSEED and neither helped.

How to reproduce

poetry dependency:

# poetry.toml
[tool.poetry.dependencies]
python = "^3.12"
pandas = "^2.2.2"
numpy = "^1.26.4"
hdbscan = ">=0.8.38"
scikit-learn = "^1.5.1"

the issue happened when I upgraded from numpy 1.26.4 to numpy 2.1.1 and keeping all other packages the same.

You can reproduce it with this data by reading it into a dataframe then run HDBSCAN.fit(df) and setting cluster_selection_epsilon = 0.15 + the parameters in the json file.

data.json

The platform name is printed with platform.platform()

  • On Linux-6.5.11-linuxkit-x86_64-with-glibc2.36 the exemplars for cluster 4 has 10 items (this is running on Apple M2)
  • On Linux-5.10.223-212.873.amzn2.x86_64-x86_64-with-glibc2.36 the exemplars for cluster 4 has only 5 items (this is running on one of the AWS machines, but seems to happen on all EC2 instances we have)

Both returned the same clusters -- only the exemplars are different. Also on numpy ` they returned the same exemplars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant