highquality_cluster30
- fragmented sequences split on undetermined aminoacid
#53
Labels
help wanted
Extra attention is needed
Hello!
I've tried using
highquality_clust30
as a reference and identified the following issue.The database has around 200k repeated entries, they appear to be fragmented proteins split on
X
aminoacid.(The additional information from headers was removed, only unique MG IDs are stored in my FASTAs for indexing with
samtools-faidx
)Example 1
When I query ESM API I get
Example 2
ESM API
The text was updated successfully, but these errors were encountered: