Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Determining Required Percentage Similarity and Handling Database Input #321

Closed
ahfitzpa opened this issue Dec 11, 2023 · 5 comments
Labels

Comments

@ahfitzpa
Copy link

ahfitzpa commented Dec 11, 2023

I am planning a virus sequencing project using Readfish. Considering the ONT error rate and the adaptive sampling system, what is the necessary percentage similarity between the reference sequence (database) and the target sequence (expected on your flow cell). Given the diversity of viruses, I would like to avoid an unwieldy mmi input file and also avoid false hits.

Copy link

Thank you for your issue. Give us a little time to review it.

PS. You might want to check the FAQ if you haven't done so already.

This is an automated reply, generated by FAQtory

@Adoni5
Copy link
Contributor

Adoni5 commented Dec 12, 2023

I think you could get away with something in the range of 70% based on some work I was doing today.

How many virus's are you looking at in the sample? You could probably get away with using a generic reference for a group of species, but that said if you are looking to differentiate between two very similar species, it might require more thought. Any thoughts @mattloose ?

@ahfitzpa
Copy link
Author

I will not know how many viruses are in the samples as it is an virus discovery project in a wide variety of samples types, therefore I cannot take a host depletion approach.
What I am hoping and will test from what you are saying is that AS via ReadFish is pretty permissive, so I can reduce the size of my db by clustering to a specific similarity. I will have fun at the other end of sequencing disentangling similar species anyway due to the ONT error rate, though it is much improved.
The size limits are pretty well documented are AS. Do you think that increasing the time a sequence spends in the pore would permit AS of shorter sequences?

Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Jan 15, 2024
Copy link

This issue was closed because there has been no response for 5 days after becoming stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants