Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suffers from bizarre bug in BioPython #8

Open
rsharris opened this issue Sep 16, 2019 · 1 comment
Open

Suffers from bizarre bug in BioPython #8

rsharris opened this issue Sep 16, 2019 · 1 comment

Comments

@rsharris
Copy link
Contributor

In rare cases, the fasta headers in the annotated output can lack one of the fields due to a seriously bizarre bug in BioPython's SeqIO.write() function.

This occurs if the sequence's length happens to be the same as the sequence's name. In this case the description DiscoverY generates, which starts with the length, is mis-interpreted inside SeqIO.write() as including the sequence name. And SeqIO.write() does you the 'favor' of removing that duplication.

This obviously can only happen if the contig names are numbers. Unfortunately for me the output of whatever assembler create my contigs file does use numbers for names. And one of them happened to match the sequence length.

Why this is a problem is I was attempting to automatically convert the annotations into a table that I could process with other tools (e.g. R). But the table can't be correctly parsed due to the favor BioPython has done.

The only useful workaround I can see is that users should be warned (in the README) that their sequence names shouldn't be numbers.

@rsharris
Copy link
Contributor Author

There is a workaround for this, which discoverY (and anything else using BioPython to write fasta) should employ. record.id should be included in description, like this: description=id=record.id + " " + str(length) + " " ...

See biopython/biopython#2270 for some discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant