-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix BLAST results protein to taxonomic accession assignment #317
Comments
This could be handled in multiple areas. Either specifically when parsing taxids for NCBI or at the LCA step. Any preference? Autometa/autometa/taxonomy/lca.py Lines 365 to 399 in baf61c0
# discard root taxid from set of query's taxids before #L370
qseqid_taxids.discard(root_taxid)
# or something like this?
from autometa.taxonomy.database import TaxonomyDatabase
qseqid_taxids.discard(TaxonomyDatabase.UNCLASSIFIED_TAXID) Autometa/autometa/taxonomy/lca.py Line 370 in baf61c0
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Currently the documentation instructs and the code downloads
prot.accession2taxid.gz
which doesn't have all of thenr
accessions.Proteins that aren't found in
prot.accession2taxid.gz
are assigned to root which results in contigs becoming unclassified.Currently this is ameliorated by using
prot.accession2taxid.FULL.gz
instead ofprot.accession2taxid.gz
, as shown below. But the code needs to be changed to handle missing accessions. Per our meeting today these should probably be assigned toNone
and then should be dropped before handing over to LCA.Assignment to root that needs to be changed:
Autometa/autometa/taxonomy/ncbi.py
Lines 453 to 457 in baf61c0
The text was updated successfully, but these errors were encountered: