Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gff from sRNAbench output - invalid literal error #77

Open
jonahcullen opened this issue Apr 11, 2023 · 4 comments
Open

gff from sRNAbench output - invalid literal error #77

jonahcullen opened this issue Apr 11, 2023 · 4 comments

Comments

@jonahcullen
Copy link

Expected behavior and actual behavior.

I am attempting to use mirtop gff from the output of sRNAbench. I expect a GFF to be returned. I am able to get this to work with the output from miraligner.

Steps to reproduce the problem.

mirtop gff --format srnabench --sps eca --hairpin hairpin.fa --gtf eca.gff3 -o HERE ../LocalTEST/

returns

04/11/2023 03:21:21 INFO Run annotation
Traceback (most recent call last):
  File "/opt/conda/envs/small/bin/mirtop", line 10, in <module>
    sys.exit(main())
  File "/opt/conda/envs/small/lib/python3.10/site-packages/mirtop/command_line.py", line 31, in main
    reader(kwargs["args"])
  File "/opt/conda/envs/small/lib/python3.10/site-packages/mirtop/gff/__init__.py", line 49, in reader
    out_dts[fn] = srnabench.read_file(fn, args)
  File "/opt/conda/envs/small/lib/python3.10/site-packages/mirtop/importer/srnabench.py", line 47, in read_file
    source_iso = _read_iso(reads_iso)
  File "/opt/conda/envs/small/lib/python3.10/site-packages/mirtop/importer/srnabench.py", line 169, in _read_iso
    iso[(cols[0], m)] = _translate(anno[m], cols[4])
  File "/opt/conda/envs/small/lib/python3.10/site-packages/mirtop/importer/srnabench.py", line 206, in _translate
    iso.extend(_iso_snp(int(nt.split(":")[0])))
ValueError: invalid literal for int() with base 10: '-$16'

Specifications like the version of the project, operating system, or hardware.

I am using mirtop (0.4.25) and sRNAbench.jar (2.0) on a university HPC running CentOS Linux 7.

Thanks for your time,
Jonah.

@jonahcullen
Copy link
Author

Apologies I should have looked a little closer and reported the isoLabel that is causing the issue - -$16:G>A,19:T>C,20:G>A with the full line (excluding RPMs):

TGGAATGTAAGGAAGTATGCAG	eca-miR-1$eca-miR-206	eca-mir-1-2$eca-mir-1-1$eca-mir-206-2	nta#G|nta#G#1$NucVar	-$16:G>A,19:T>C,20:G>A

@lpantano
Copy link
Contributor

Thank you for submitting this error. Could you share the hairpin file, and the GFF file. I can try to debug with those and the line you identified as problematic.

@lpantano
Copy link
Contributor

Do you know what the - symbol would mean there?

@jonahcullen
Copy link
Author

jonahcullen commented Apr 21, 2023

Thank you for your response! I do not know what - means here, it occurs rarely along side other variants (e.g.18:A>G) but always causes an error when it is the first one listed. That same column (sequenceVariant) contains - with no other variants as well. I'm guessing the update from sRNAbench v1.2 or v1.6 to 2.0 is what is causing the issue. For example, exact no longer occurs any in the microRNAannotation.txt file.

I've attached the eca3.ens_mirtop.gff.txt, hairpin.fa.txt (miRBase v22 filtered to include only eca), and the microRNAannotation.txt files. Apologies I had to append the fasta and GFF with .txt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants