Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure FPS/CUR selected idxs are unique in the case of zero-score #224

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jwa7
Copy link

@jwa7 jwa7 commented Mar 26, 2024

Attempting to fix #206

Hello!

I have encountered the issue raised by Alex in the above issue, whilst working with the equisolve wrapper for TensorMap-based sample/feature selection (i.e. in https://github.com/lab-cosmo/equisolve/blob/main/src/equisolve/numpy/sample_selection.py and co).

I have adapted Alex's example into a few unit tests for both FPS and CUR sample/feature selection, and attempted to fix it. However, there is something I'm not understanding. While the FPS tests now pass, there are a couple of (different) CUR ones that do not.

With this PR I was hoping to get some feedback/help from the skmatter dev team on this. Thanks! :)

Contributor (creator of PR) checklist

  • Tests updated (for new features and bugfixes)?
  • Documentation updated (for new features)?
  • Issue referenced (for PRs that solve an issue)?

For Reviewer

  • CHANGELOG updated if important change?

📚 Documentation preview 📚: https://scikit-matter--224.org.readthedocs.build/en/224/

@rosecers
Copy link
Collaborator

Are you still looking for feedback on this? Good measure to tag one of us for input.

@jwa7
Copy link
Author

jwa7 commented Jun 20, 2024

Hey! This isn't something I'm actively working on at the moment, but I'll be sure to ping you for feedback when I / we get back round to it :)

@mhellstr
Copy link

I had the problem that CUR sample selection would give duplicate selected indices, for example X.selected_idx_ ==
[ 801 936 1308 253 1000 480 183 486 303 977 43 366 734 243 363 88 859 440 798 398 709 263 796
383 214 654 854 508 865 384 413 643 413 643 413 643 413 643 413 643 413 643 413 643 413 643
413 643 413 643 413 643 413 643 413 643 413 643 413 643 413 643 413 643 413 643 413 643 413
643 413 643 413 643 413 643 413 643 413 643 413 643 413 643 413 643 413 643 413 643 413 643
413 643 413 643 413 643 413 643]

I cannot comment on the validity of the code, but this branch solves this problem to instead give unique predictions like this, X.selected_idx_ ==
[ 801 936 1308 253 1000 480 183 486 303 977 43 366 734 243 363 88 859 440 798 398 709 263 796
383 214 654 854 508 865 384 413 643 216 555 408 564 892 716 673 99 107 386 137 55 1 499
368 390 359 218 237 130 530 661 439 311 318 542 669 830 268 208 215 903 418 855 994 664 631
879 199 306 869 258 884 1418 123 700 266 1282 608 519 351 389 5 652 553 257 934 24 1455 270
825 744 1114 470 757 572 1259 15]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zero scores result in repeated selection and wrong scores at least for FPS
3 participants