You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this case, all black out sequences are core cluster, and are displayed in both columns as the first instance of their cluster list.
In no case other than the first, third, and fourth line, a core cluster would be displayed on the second column. That would indicate that the core cluster, is also part of another different cluster.
Current Behavior
I have this odd behavior, I don't know how to interpret this result
Some core cluster sequences appear to also be part of other clusters.
How should I interpret this result? as a bug? and outlier? ( I have 288 confirmed instances of this same issue in the TSV output), Should I choose the good cluster and join the bad one into the other?
The text was updated successfully, but these errors were encountered:
rpalmavejares
changed the title
Odd Representative Cluster Behaviour / Selecction
Odd Representative Cluster Behaviour / Selection
Nov 14, 2024
Expected Behavior
Sequences that are Core Cluster should be output as belonging to their same cluster. Not in different ones
The example is pretty simple:
TSC000_k99_1536813_gene1 TSC000_k99_1536813_gene1
TSC000_k99_1536813_gene1 TSC002_k99_986141_gene1
TSC000_k99_319273_gene1 TSC000_k99_319273_gene1
TSC000_k99_1362901_gene1 TSC000_k99_1362901_gene1
TSC000_k99_143397_gene1 TSC000_k99_143397_gene1
In this case, all black out sequences are core cluster, and are displayed in both columns as the first instance of their cluster list.
In no case other than the first, third, and fourth line, a core cluster would be displayed on the second column. That would indicate that the core cluster, is also part of another different cluster.
Current Behavior
I have this odd behavior, I don't know how to interpret this result
Some core cluster sequences appear to also be part of other clusters.
TSC053_k99_1024271_gene1 TSC040_k99_1291964_gene1
TSC053_k99_1024271_gene1 TSC045_k99_976664_gene1
TSC047_k99_1354130_gene1 TSC053_k99_1024271_gene1
Notice how sequence TSC053_k99_1024271_gene1, does not have a line beginning in:
TSC053_k99_1024271_gene1 TSC053_k99_1024271_gene1
To add to this issue, sequence TSC053_k99_1024271_gene1 is being output as part of other cluster.
TSC047_k99_1354130_gene1 TSC047_k99_1354130_gene1
TSC047_k99_1354130_gene1 TSC053_k99_1024271_gene1
As you can see, TSC047_k99_1354130_gene1 has the normal expected output.
The problem comes with TSC053_k99_1024271_gene1 and TSC047_k99_1354130_gene1 being in the Representative cluster output file.
Steps to Reproduce (for bugs)
These are the command that I ran.
$mmseqs createdb $1 NEW_GENE_CATALOG/tara_source.mmseqs.db --dbtype 2 --shuffle 0
$mmseqs cluster NEW_GENE_CATALOG/tara_source.mmseqs.db NEW_GENE_CATALOG/tara_source.mmseqs.cluster ./tmp --remove-tmp-files 0 --kmer-per-seq-scale 0 --cluster-mode 2 --min-seq-id 0.95 --threads 20 --cov-mode 1 -c 0.9 --split-memory-limit 700G
$mmseqs createsubdb NEW_GENE_CATALOG/tara_source.mmseqs.cluster NEW_GENE_CATALOG/tara_source.mmseqs.db NEW_GENE_CATALOG/tara_source.mmseqs.rep
$mmseqs convert2fasta NEW_GENE_CATALOG/tara_source.mmseqs.rep NEW_GENE_CATALOG/tara_source.mmseqs.rep.fasta
$mmseqs createtsv NEW_GENE_CATALOG/tara_source.mmseqs.db NEW_GENE_CATALOG/tara_source.mmseqs.db NEW_GENE_CATALOG/tara_source.mmseqs.cluster NEW_GENE_CATALOG/tara_source.mmseqs.cluster.tsv
Your Environment
I'm using MMseqs2 Version: 15.6f452
How should I interpret this result? as a bug? and outlier? ( I have 288 confirmed instances of this same issue in the TSV output), Should I choose the good cluster and join the bad one into the other?
The text was updated successfully, but these errors were encountered: