You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to use OMArK to assess the quality of a genome annotation and to be able to compare multiple ones later.
However, whenever I try to run OMArK with the command omark -f killi_lifton_annot_correct3.omamer -d LUCA.h5 -o killi_lifton -t 105023 -of ../GCF_001465895.1_Nfu_20140520_genomic.fna -v on my dataset I get the following error:
INFO: Starting OMArk
INFO: Input parameters passed validity check
INFO: Extracting data from input file: killi_lifton_annot_correct3.omamer
INFO: Determinating species composition from HOG placements
INFO: A taxid was provided. The query taxon is Nothobranchius furzeri
INFO: Ancestral lineage is Cyprinodontiformes
INFO: Estimating ancestral and conserved HOG content
INFO: 29752 HOGs are associated to the query's lineage and will be used for consistency assesment
INFO: 16815 conserved ancestral HOGs will be used from completeness assesment
INFO: Comparing the query gene repertoire to lineage-associated HOGs
INFO: Comparing the query gene repertoire to conserved ancestral HOGs
INFO: Writing OMArk output files
Traceback (most recent call last):
File "/data/leuven/340/vsc34088/miniconda3/envs/omark/bin/omark", line 51, in <module>
omark.launcher(arg)
File "/data/leuven/340/vsc34088/miniconda3/envs/omark/lib/python3.12/site-packages/omark/omark.py", line 265, in launcher
get_omamer_qscore(omamerfile, dbpath, outdir, taxid, original_FASTA_file = original_fasta, isoform_file=isoform_file, taxonomic_rank=taxonomic_rank)
File "/data/leuven/340/vsc34088/miniconda3/envs/omark/lib/python3.12/site-packages/omark/omark.py", line 149, in get_omamer_qscore
io.store_contaminant_FASTA(stordir, basefile, prot_clade, original_FASTA_file)
File "/data/leuven/340/vsc34088/miniconda3/envs/omark/lib/python3.12/site-packages/omark/files.py", line 307, in store_contaminant_FASTA
seq = seqs_by_id[prot_data[1]]
~~~~~~~~~~^^^^^^^^^^^^^^
KeyError: 'XM_054735811'
I tried to change the headers of my protein sequences in the omamer file by removing special characters or removing parts of the headers multiple times but I always get the same issue. However, when I remove the specific sequence mentioned in the error the writing continue a little bit and then stop to other sequences. The only link I found between these problematic sequences is the hoglevel (Nothobranchius furzeri in this case) but I don't understand why it's causing an issue.
Here is a little part of my omamer file in which sequences XM_054735811 and XM_054735812 are problematic:
Thank you for using OMArk and reporting this issue.
From a cursory look at this issue, it seems like this may be due to the protein identifiers in the FASTA file provided through the '-of' option (GCF_001465895.1_Nfu_20140520_genomic.fna) and the one in the OMAmer output being different.
Is this FASTA file the original proteome FASTA file that was first provided to OMAmer?
From the file name, it looks like it may be instead a FASTA file of the genomic sequence. If it is the case, please try changing this file to the proteome FASTA file that was used as input to OMAmer.
Let me know if I am mistaken or if changing this file do not solve the issue. If it does not, could you write down a few examples of the headers from this file?
If it does, I'll keep this issue open as an incentive to work on better parameter checking.
Thank you for your quick reply.
I indeed used the FASTA file of the genomic sequence and when I changed this file to the proteome FASTA file the OMArK tool worked.
Dear,
I'm trying to use OMArK to assess the quality of a genome annotation and to be able to compare multiple ones later.
However, whenever I try to run OMArK with the command
omark -f killi_lifton_annot_correct3.omamer -d LUCA.h5 -o killi_lifton -t 105023 -of ../GCF_001465895.1_Nfu_20140520_genomic.fna -v
on my dataset I get the following error:I tried to change the headers of my protein sequences in the omamer file by removing special characters or removing parts of the headers multiple times but I always get the same issue. However, when I remove the specific sequence mentioned in the error the writing continue a little bit and then stop to other sequences. The only link I found between these problematic sequences is the hoglevel (Nothobranchius furzeri in this case) but I don't understand why it's causing an issue.
Here is a little part of my omamer file in which sequences XM_054735811 and XM_054735812 are problematic:
Do you have any idea why I'm experiencing this issue?
Thank you in advance for your help,
Enora
The text was updated successfully, but these errors were encountered: