You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am using your example sequence data included in mmseq release to demonstrate this issue. Please tell me what we need to do to update a cluster db correctly. We were using it to update our cluster database and it's crucial that the updating does not miss sequences.
I have tested two versions including the latest version:
bin/mmseqs version b804fbe
I constructed clusters exactly following mmseqs command line example (i.e., I first cluster an old sequence db and then update the resulting cluster db using a new sequence db. I see the updated cluster db missed sequence entries that are still present in new sequence db).
So newClusterDB is the updated cluster db. You can see it only contains only 7313 sequence entries (i.e., cluster members) while the new sequence db contains 7500 sequences.
Hi, I am using your example sequence data included in mmseq release to demonstrate this issue. Please tell me what we need to do to update a cluster db correctly. We were using it to update our cluster database and it's crucial that the updating does not miss sequences.
I have tested two versions including the latest version:
bin/mmseqs version
b804fbe
I constructed clusters exactly following mmseqs command line example (i.e., I first cluster an old sequence db and then update the resulting cluster db using a new sequence db. I see the updated cluster db missed sequence entries that are still present in new sequence db).
Here are the commands I used:
bin/mmseqs createdb <(head -n 10000 examples/DB.fasta) sequenceDB
bin/mmseqs cluster sequenceDB clusterDB tmp
bin/mmseqs createdb <(tail -n +1001 examples/DB.fasta | head -n 15000) updateSequenceDB
mmseqs clusterupdate sequenceDB updateSequenceDB clusterDB newSequenceDB newClusterDB tmp
bin/mmseqs createtsv newSequenceDB newSequenceDB newClusterDB newClusterDB.tsv
So newClusterDB is the updated cluster db. You can see it only contains only 7313 sequence entries (i.e., cluster members) while the new sequence db contains 7500 sequences.
wc newClusterDB.tsv
7313 14626 123022 newClusterDB.tsv
wc updateSequenceDB.lookup
7500 22500 114486 updateSequenceDB.lookup
wc newSequenceDB.lookup
7500 22500 115124 newSequenceDB.lookup
Here is one example sequence ( X7SER2 ) that the newClusterDB db missed:
It is in the updateSequenceDB that is used to update the cluster db:
grep X7SER2 updateSequenceDB.lookup
5962 X7SER2 0
It is also in the new sequence db after cluster update:
grep X7SER2 newSequenceDB.lookup
2152 X7SER2 0
Yet it is missing in the resulting cluster db:
grep X7SER2 newClusterDB.tsv
The text was updated successfully, but these errors were encountered: