Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aragorn_out_to_gff3.py error when parsing genomes with tmRNA's #1482

Open
cdshaffer opened this issue Aug 16, 2024 · 0 comments
Open

aragorn_out_to_gff3.py error when parsing genomes with tmRNA's #1482

cdshaffer opened this issue Aug 16, 2024 · 0 comments

Comments

@cdshaffer
Copy link

I was using the "CPT Phage Structural Workflow v2024.1 shared by user jasongill" at the "https://phage.usegalaxy.eu/" galaxy isntance on a phage and getting an error in the workflow which I traced back to the tool aragorn_out_to_gff3.py

Here is the genome that is giving the error:
Elmer.fasta.txt

Here are the details from galaxy:

Galaxy Tool ID | toolshed.g2.bx.psu.edu/repos/bgruening/trna_prediction/aragorn_trna/0.6

with this command line returning the error:
aragorn '/data/dnb10/galaxy_db/files/3/4/8/dataset_348bacf1-a138-43a0-a09b-6be8e8854bac.dat' -gc11 -m -t -c -w | python '/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/trna_prediction/358f58401cd6/trna_prediction/aragorn_out_to_gff3.py' false > '/data/jwd02f/main/072/526/72526578/outputs/dataset_39c2a2b6-0d3f-4e62-bb00-c465960c860d.dat'

and this traceback:
Traceback (most recent call last): File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/trna_prediction/358f58401cd6/trna_prediction/aragorn_out_to_gff3.py", line 66, in <module> aa_short = aa_table[aa_long] KeyError: ''

I believe the error comes from the tmRNA that is present in the genome. I ran aragorn locally using this command:

» aragorn Elmer.fasta -gc11 -m -t   -c  -w | tail -8
38  tRNA-Arg                 [99283,99355]	34  	(acg)
39  tmRNA                  [100524,100803]	91,132	ANSNVASAYALAA*
40  tRNA-Leu               [108208,108281]	34  	(taa)
41  tRNA-Leu               [108598,108680]	34  	(gag)
42  tRNA-Val               [109257,109328]	33  	(gac)
43  tRNA-Leu               [110275,110352]	36  	(caa)
44  tRNA-Ser               [110601,110692]	37  	(gct)
45  tRNA-Gln               [117516,117591]	35  	(ttg)

When this is parsed by aragorn_out_to_gff3.py the line containing the tmRNA data is split and aa_long is created by aa_long = data[1][5:] which does return the 3 letter amino acid code for tRNA lines but returns an empty string on the tmRNA line, hence the key error in the above traceback. In fact a brief look over the aragorn_out_to_gff3.py code and it does not appear to do any parsing of tmRNA lines, which, as you can see above, are quite different from the tRNA lines.

Suggesting code to parse the tmRNA is beyond me, but it does appear that an easy mitigation to start which would be to just have aragorn only call tRNA's and not call the tmRNA genes. This works on my machine by removing the '-m' when calling aragorn:

» aragorn Elmer.fasta -gc11 -t   -c  -w | tail -8
37  tRNA-Thr                 [99092,99164]	34  	(ggt)
38  tRNA-Arg                 [99283,99355]	34  	(acg)
39  tRNA-Leu               [108208,108281]	34  	(taa)
40  tRNA-Leu               [108598,108680]	34  	(gag)
41  tRNA-Val               [109257,109328]	33  	(gac)
42  tRNA-Leu               [110275,110352]	36  	(caa)
43  tRNA-Ser               [110601,110692]	37  	(gct)
44  tRNA-Gln               [117516,117591]	35  	(ttg)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant