Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero scores result in repeated selection and wrong scores at least for FPS #206

Open
agoscinski opened this issue Aug 8, 2023 · 0 comments · May be fixed by #224
Open

Zero scores result in repeated selection and wrong scores at least for FPS #206

agoscinski opened this issue Aug 8, 2023 · 0 comments · May be fixed by #224
Labels
bug Something isn't working

Comments

@agoscinski
Copy link
Collaborator

Detected by @PicoCentauri

Problem

import numpy as np
from skmatter.feature_selection import FPS

np.random.seed(0)
n_samples = 10
n_features = 15
X = np.random.rand(n_samples , n_features )
X[:, 3] = np.random.rand(10) * 1e-13
X[:, 4] = np.random.rand(10) * 1e-13
selector_problem = FPS(n_to_select=len(X.T)).fit(X)
print(selector_problem.selected_idx_)
print(selector_problem.get_select_distance())
print()

# this selector does not have the problem because we stop before the score threshold
selector = FPS(n_to_select=len(X.T), score_threshold=1e-9).fit(X)
print(selector.selected_idx_)
print(selector.get_select_distance())

Out:

[ 0  8  3  6 14  2 13  9  7 11  1 10 12  5  8]
[           inf 1.77635684e-15 2.16390745e+00 1.62400552e+00
 1.43445978e+00 1.23482177e+00 1.03370164e+00 9.21863706e-01
 7.95155761e-01 7.87817521e-01 7.37837489e-01 6.52674372e-01
 6.11845170e-01 5.65607255e-01 1.77635684e-15]
/home/alexgo/code/scikit-matter/src/skmatter/_selection.py:210: UserWarning: Score threshold of 1e-09 reached.Terminating search at 14 / 15.
  warnings.warn(
[ 0  8  3  6 14  2 13  9  7 11  1 10 12]
[       inf 2.75832232 2.16390745 1.62400552 1.43445978 1.23482177
 1.03370164 0.92186371 0.79515576 0.78781752 0.73783749 0.65267437
 0.61184517]

You can see in the first selector that 8 is reselected and sets the wrong score. This is because we do not filter for not selected points in the GreedySelector base class when choosing the next point.

max_score_idx = np.argmax(scores)

So when the scores are all (numerical) zero, then points that have been already selected can be reselected.

Solution

One could add selected_idx_ to the GreedySelector base class and change
the argmax in the function above that it only considers the not selected indices.

@agoscinski agoscinski added the bug Something isn't working label Aug 8, 2023
@jwa7 jwa7 linked a pull request Mar 26, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant