-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sourmash gather
raises AssertionError
on minimal test dataset
#2825
Comments
Thanks! I see the same error with commit 720962c, the latest |
I'm getting some intriguing errors with
as well. It seems likely that there is a bug related to handling of many different scaled values -- here the query metagenome is at scaled of 10, while the database has three different scaled values of 1, 20, and 1600 (!?):
|
Yes, scaling is a bit weird in this case 😅. However, these sketches work well with the larger database (GTDB), so I believed the issue might not be related to the different scaling. |
Sorry, I'm still tracking this down! I understand where the problem lies but not exactly why it's happening - working on it over in #2832. But I wanted to respond to this:
Note that for comparison purposes, everything needs to be at the same scaled, so all samples are scaled "up" to the highest scaled value needed for the comparison. This operation is always possible on FracMinHash sketches (just like reducing There is a brief technical discussion of this in the internals guide, and a more complete (but academic) discussion in Irber et al., 2022. |
Dear sourmash team,
I've recently attempted to create a minimal test dataset and encountered an error when using
sourmash gather
.Specifically, it raises an
AssertionError
.I think that the error might be due to the presence of very few sketches in common between the sample and the database.
The error appears even with
--threshold-bp 0
.Expected Behavior:
While this may be a marginal case, I believe the expected behavior in this scenario is for
sourmash gather
to return an empty CSV file with no results, and subsequently an exit code of 0.Actual Behavior:
I've got an
AssertionError
when running the command.Sourmash fails after Prefetch and "Doing gather to generate minimum metagenome cover".
Additional Information:
gtdb-rs214-reps.k31.zip
), it works as expected without any issues.Here is the full output log:
The command I executed was:
I've uploaded the test data here:
Thank you for providing such a valuable tool!
With kind regards,
Vladimir
PS. As this was a small test dataset, I used
scaled=10
to generate sample signatures,and the database signatures were also created with a different scale setting.
Given that it works correctly with a larger database, the issue likely isn't related to the scaling difference.
The text was updated successfully, but these errors were encountered: