Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in highquality_clust30 #56

Open
valentynbez opened this issue Jul 5, 2024 · 3 comments
Open

Errors in highquality_clust30 #56

valentynbez opened this issue Jul 5, 2024 · 3 comments

Comments

@valentynbez
Copy link

valentynbez commented Jul 5, 2024

The database contains empty structures which lead to segfault if one tries to extract them.
The list of empty structures was obtained by running:

foldcomp check highquality_clust30 --threads 32 &> error_ids.txt

error_ids.txt

@khb7840
Copy link
Member

khb7840 commented Jul 8, 2024

Thank you for notifying this. I'm checking this database along with other issues.

@valentynbez
Copy link
Author

Encountered the same error when creating my own database, it's not clear what is the feature of the structures that fail to compress.

@valentynbez
Copy link
Author

In both cases there are sequences split on weird alphabet letter, supposedly X:

>phrog_433:protein63813
M
--
>phrog_433:protein63813
KCRKKIFLYREDGTEDIKVIKYKDNVNEVYSLTGAHFSDEKKIMTDSDLKRFKGAHGLLYEQELGLQATIFDI
>MGYP003343806611
MLRIKITDADRAGRAGEWCQANLGRDDWNLYGHNLFTGTPYYEFEFTDSETAMMFALRWA
--
>MGYP003343806611
YY

It appears to be related to #53 and the way AA_UNK_CHAR is processed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants