Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Reproducing UMAP Model from Outliers Tutorial #1164

Open
pperezdomin opened this issue Nov 27, 2024 · 0 comments
Open

Issue with Reproducing UMAP Model from Outliers Tutorial #1164

pperezdomin opened this issue Nov 27, 2024 · 0 comments

Comments

@pperezdomin
Copy link

pperezdomin commented Nov 27, 2024

Description:
I am unable to reproduce the last UMAP model from the "Outliers" tutorial section of the official UMAP documentation.

The tutorial suggests using set_op_mix_ratio=0.25 parameter to create a UMAP model for detecting outliers, but the plot I get from running the code does not match the example shown in the documentation. I have attempted the code in both Python and R, but the results are still inconsistent.

Steps to Reproduce:

  • Follow the steps in the UMAP Outliers Tutorial.
  • Run the provided Python code with the given parameters.
  • Additionally, I replicated the example using R with the same data and settings.
  • The resulting plots do not match the one shown in the tutorial.

Expected Outcome:
The generated plot should match the one shown in the tutorial, where the UMAP model preserves outliers as outlying, while still retaining the benefits of a union operation. The example plot in the tutorial has specific cluster shapes that I am unable to replicate.

Actual Outcome:
The plot I generate in both Python and R looks different from the one shown in the tutorial. Clusters 3s and 5s do not merge as in the example plot, and the shape of the 1s is more similar to that from the UMAP models with set_op_mix_ratio=1. Also the shapes and distances of the other clusters does not align with those from the example plot.

Code:
Python Code:

import numpy as np
import sklearn.datasets
import sklearn.neighbors
import umap
import umap.plot
import matplotlib.pyplot as plt
%matplotlib inline

# Example data from the tutorial
data, labels = sklearn.datasets.fetch_openml('mnist_784', version=1, return_X_y=True)

# UMAP settings
mapper = umap.UMAP(set_op_mix_ratio=0.25).fit(data)

# Plot the result
umap.plot.points(mapper, labels=labels)

R Code:

library(uwot)

# Example data from the tutorial
set.seed(42)
train <<- load_image_file('DATA/train-images-idx3-ubyte')
test <<- load_image_file('DATA/t10k-images-idx3-ubyte')

train$y <<- load_label_file('DATA/train-labels-idx1-ubyte')
test$y <<- load_label_file('DATA/t10k-labels-idx1-ubyte') 

# UMAP settings
mapper <- uwot::umap(rbind(train$x,test$x) , n_neighbors = 15, min_dist = 0.1, set_op_mix_ratio = 0.25)

#Plot the result
plot(x = mapper[,1],
     y = mapper[,2],
     col = RColorBrewer::brewer.pal(10, 'Spectral')[cut(c(train$y, test$y), 10)],
     pch = 16, cex = .2
)

Screenshots/Images:
Python Output:

image

R Output:

image

Environment:
UMAP Version: 0.5.7
Python Version: 3.10.12
Matplotlib: 3.8.0
NumPy: 1.26.4

R Version: 4.4.1
UWOT: 0.2.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant