Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selection::from_record does not respect scaled #3384

Closed
ctb opened this issue Nov 10, 2024 · 5 comments · Fixed by #3387
Closed

Selection::from_record does not respect scaled #3384

ctb opened this issue Nov 10, 2024 · 5 comments · Fixed by #3387
Labels

Comments

@ctb
Copy link
Contributor

ctb commented Nov 10, 2024

In selection.rs, Selection::from_record does not pull the scaled value in from the Record:

pub fn from_record(row: &Record) -> Result<Self> {
Ok(Self {
ksize: Some(row.ksize()),
abund: Some(row.with_abundance()),
moltype: Some(row.moltype()),
num: None,
scaled: None,
containment: None,
picklist: None,
})

Among perhaps other side effects, this means that Collection::sig_from_record does not return an appropriately downsampled signature.

This confusion was the source of a bug in branchwater - sourmash-bio/sourmash_plugin_branchwater#505.

@ctb ctb added the rust label Nov 10, 2024
@ctb
Copy link
Contributor Author

ctb commented Nov 10, 2024

Interestingly we apparently figured this out before, because @bluegenes noted it in a test 😆

for (idx, _rec) in cl.iter() {
// need to pass select again here so we actually downsample
let this_sig = cl.sig_for_dataset(idx).unwrap().select(&selection).unwrap();
let this_mh = this_sig.minhash().unwrap();
assert_eq!(this_mh.scaled(), 100);

@ctb
Copy link
Contributor Author

ctb commented Nov 10, 2024

Tracking it down a bit more, it looks like maybe this is intentional - Collection::select delegates to Manifest::select which merely selects rows with compatible scaled values, but does not actually set the scaled value in the Record.

valid = if let Some(scaled) = selection.scaled() {
// num sigs have row.scaled = 0, don't include them
valid && row.scaled != 0 && row.scaled <= scaled
} else {

@ctb
Copy link
Contributor Author

ctb commented Nov 11, 2024

The contortions I'm going through over in sourmash-bio/sourmash_plugin_branchwater#504 make me think that we definitely want to provide a way to request downsampling on sig_from_record 😆

It felt related to sourmash-bio/sourmash_plugin_branchwater#501 to me, too, but after rereading everything I'm not sure. Sigh. Complexities.

@ctb
Copy link
Contributor Author

ctb commented Nov 12, 2024

per slack, luiz speaketh:

I think this is a straight bug, should be setting scaled and num too!

(later conversation: we're not sure about num ;))

@ctb
Copy link
Contributor Author

ctb commented Nov 13, 2024

Fixed by #3387.

@ctb ctb closed this as completed in #3387 Nov 13, 2024
@ctb ctb closed this as completed in 97e7808 Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant