You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been trying to provide hits to blobtools via blastx search of BUSCO gene regions vs Uniprot, however all contigs are being assigned to no-hit despite there being taxid hit info in the input file:
A BUSCOgenes taxrule was mentionaed in the recent workshop but I can't find reference to this anywhere. I am wondering if I need to remove the text follow in ':' characters in columns 1 and 4, however the test data provided in the recent workshop had text in addition to the contig names following ':' and provided hits to the plot without issue:
When blobtools sees the format in the workshop example, it should automatically recognise it as the output of the diamond blastp step from the blobtools pipeline and treat it as a blastp file and parse the details in the sequence_id columns accordingly, in this case the tax rule provided really only changes the name of the output fields so buscogenes acts as a label and setting --taxrule buscogenes is the same as explicitly setting the blastp tax rule and giving it an alternate name with --taxrule blastp=buscogenes.
The sequence IDs in your file look to be missing the =1275837at2759=single part that adds the busco gene information so the import is treating the sequence ID as being ptg016977l:1-7766, as you thought, removing everything after the : should fix the problem as then the sequence IDs will match the sequence IDs from the assembly fasta file
I have been trying to provide hits to blobtools via blastx search of BUSCO gene regions vs Uniprot, however all contigs are being assigned to no-hit despite there being taxid hit info in the input file:
`ptg016977l:1-7766 121845 223.4 ptg016977l:1-7766 tr|A0A3Q0J621|A0A3Q0J621_DIACI 93.3 119 8 0 5149 4793 25 143 4.1e-54 223.4
ptg016977l:1-7766 121845 161.4 ptg016977l:1-7766 tr|A0A3Q0J621|A0A3Q0J621_DIACI 85.4 82 12 0 1757 1512 179 260 1.9e-35 161.4
ptg004722l:33979-35044 28743 468.8 ptg004722l:33979-35044 tr|A0A3Q2GQF1|A0A3Q2GQF1_CYPVA 65.5 354 106 1 1 1062 8 345 7.7e-129 468.8
ptg008147l:570-8350 7740 451.1 ptg008147l:570-8350 tr|A0A8K0EU27|A0A8K0EU27_BRALA 42.7 536 294 5 3707 5278 579 1113 1.2e-122 451.1
`
A BUSCOgenes taxrule was mentionaed in the recent workshop but I can't find reference to this anywhere. I am wondering if I need to remove the text follow in ':' characters in columns 1 and 4, however the test data provided in the recent workshop had text in addition to the contig names following ':' and provided hits to the plot without issue:
`ptg000043l:272955-274207=1275837at2759=single 1903189 553 ptg000043l:272955-274207=1275837at2759=single tr|A0A8H3G8J7|A0A8H3G8J7_9LECA 93.4 316 15 3 1 315 1 311 2.01e-196 553
ptg000043l:272955-274207=1275837at2759=single 560253 553 ptg000043l:272955-274207=1275837at2759=single tr|A0A8H6CAM3|A0A8H6CAM3_9LECA 93.4 316 15 3 1 315 1 311 2.85e-196 553
ptg000043l:272955-274207=1275837at2759=single 112416 553 ptg000043l:272955-274207=1275837at2759=single tr|A0A8H6G095|A0A8H6G095_9LECA 93.4 316 15 3 1 315 1 311 2.85e-196 553
ptg000043l:272955-274207=1275837at2759=single 172621 549 ptg000043l:272955-274207=1275837at2759=single tr|A0A8H3FIP3|A0A8H3FIP3_9LECA 93.0 316 16 3 1 315 1 311 1.45e-194 549`
The text was updated successfully, but these errors were encountered: