Sequence entry is dropped from cluster after mmseqs clusterupdate workflow #961

jianye00 · 2025-02-22T20:25:49Z

Hi, I am using your example sequence data included in mmseq release to demonstrate this issue. Please tell me what we need to do to update a cluster db correctly. We were using it to update our cluster database and it's crucial that the updating does not miss sequences.

I have tested two versions including the latest version:
bin/mmseqs version
b804fbe

I constructed clusters exactly following mmseqs command line example (i.e., I first cluster an old sequence db and then update the resulting cluster db using a new sequence db. I see the updated cluster db missed sequence entries that are still present in new sequence db).

Here are the commands I used:

bin/mmseqs createdb <(head -n 10000 examples/DB.fasta) sequenceDB
bin/mmseqs cluster sequenceDB clusterDB tmp
bin/mmseqs createdb <(tail -n +1001 examples/DB.fasta | head -n 15000) updateSequenceDB
mmseqs clusterupdate sequenceDB updateSequenceDB clusterDB newSequenceDB newClusterDB tmp
bin/mmseqs createtsv newSequenceDB newSequenceDB newClusterDB newClusterDB.tsv

So newClusterDB is the updated cluster db. You can see it only contains only 7313 sequence entries (i.e., cluster members) while the new sequence db contains 7500 sequences.

wc newClusterDB.tsv
7313 14626 123022 newClusterDB.tsv

wc updateSequenceDB.lookup
7500 22500 114486 updateSequenceDB.lookup
wc newSequenceDB.lookup
7500 22500 115124 newSequenceDB.lookup

Here is one example sequence ( X7SER2 ) that the newClusterDB db missed:
It is in the updateSequenceDB that is used to update the cluster db:

grep X7SER2 updateSequenceDB.lookup
5962 X7SER2 0

It is also in the new sequence db after cluster update:

grep X7SER2 newSequenceDB.lookup
2152 X7SER2 0

Yet it is missing in the resulting cluster db:
grep X7SER2 newClusterDB.tsv

The text was updated successfully, but these errors were encountered:

jianye00 mentioned this issue Feb 26, 2025

The effect of --min-seq-id parameter #959

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequence entry is dropped from cluster after mmseqs clusterupdate workflow #961

Sequence entry is dropped from cluster after mmseqs clusterupdate workflow #961

jianye00 commented Feb 22, 2025 •

edited

Loading

Sequence entry is dropped from cluster after mmseqs clusterupdate workflow #961

Sequence entry is dropped from cluster after mmseqs clusterupdate workflow #961

Comments

jianye00 commented Feb 22, 2025 • edited Loading

jianye00 commented Feb 22, 2025 •

edited

Loading