Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running IsoQuant with customized gtf file. #274

Open
SuiYue-2308 opened this issue Jan 7, 2025 · 3 comments
Open

Error when running IsoQuant with customized gtf file. #274

SuiYue-2308 opened this issue Jan 7, 2025 · 3 comments
Labels
input data Issue is caused by input data

Comments

@SuiYue-2308
Copy link

Hi,

I'm using IsoQuant to transcript discovery. And I delete 50% of transcripts on chr1. However, I cannot complete the process.

2025-01-07 09:44:46,678 - INFO - Running IsoQuant version 3.6.1
2025-01-07 09:44:46,678 - WARNING - Output folder already exists, some files may be overwritten.
2025-01-07 09:44:46,681 - INFO - Novel unspliced transcripts will not be reported, set --report_novel_unspliced true to discover them
2025-01-07 09:44:46,682 - INFO -  === IsoQuant pipeline started === 
2025-01-07 09:44:46,682 - INFO - gffutils version: 0.13
2025-01-07 09:44:46,682 - INFO - pysam version: 0.22.1
2025-01-07 09:44:46,682 - INFO - pyfaidx version: 0.8.1.3
2025-01-07 09:44:46,682 - INFO - Converting gene annotation file to .db format (takes a while)...
/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py:770: UserWarning: It appears you have a gene feature in your GTF file. You may want to use the `disable_infer_genes=True` option to speed up database creation
  warnings.warn(
/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py:763: UserWarning: It appears you have a transcript feature in your GTF file. You may want to use the `disable_infer_transcripts=True` option to speed up database creation
  warnings.warn(
2025-01-07 09:44:46,690 - CRITICAL - IsoQuant failed with the following error, please, submit this issue to https://github.com/ablab/IsoQuant/issuesTraceback (most recent call last):
  File "/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py", line 790, in _populate_from_lines
    self._insert(f, c)
  File "/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py", line 566, in _insert
    cursor.execute(constants._INSERT, feature.astuple())
sqlite3.IntegrityError: UNIQUE constraint failed: features.id

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/suiyue/Documents/other_method/IsoQuant/isoquant.py", line 819, in <module>
    main(sys.argv[1:])
  File "/home/suiyue/Documents/other_method/IsoQuant/isoquant.py", line 813, in main
    run_pipeline(args)
  File "/home/suiyue/Documents/other_method/IsoQuant/isoquant.py", line 749, in run_pipeline
    args.genedb = convert_gtf_to_db(args)
  File "/home/suiyue/Documents/other_method/IsoQuant/src/gtf2db.py", line 144, in convert_gtf_to_db
    gtf_filename, genedb_filename = convert_db(gtf_filename, genedb_filename, gtf2db, args)
  File "/home/suiyue/Documents/other_method/IsoQuant/src/gtf2db.py", line 360, in convert_db
    convert_fn(gtf_filename, genedb_filename, args.complete_genedb, args.gtf_check)
  File "/home/suiyue/Documents/other_method/IsoQuant/src/gtf2db.py", line 133, in gtf2db
    gffutils.create_db(gtf, db, force=True, keep_order=True, merge_strategy='error',
  File "/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py", line 1401, in create_db
    c.create()
  File "/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py", line 543, in create
    self._populate_from_lines(self.iterator)
  File "/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py", line 792, in _populate_from_lines
    fixed, final_strategy = self._do_merge(f, self.merge_strategy)
  File "/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py", line 257, in _do_merge
    raise ValueError("Duplicate ID {0.id}".format(f))
ValueError: Duplicate ID ENST00000619216

It seems like all the other chromosome is ok, because I can see the the result on the other chromosome.
image

@andrewprzh
Copy link
Collaborator

Dear @SuiYue-2308

This error is raised by gffutils that converts GTF file to internal database format.
It complains about duplicated ids:
ValueError: Duplicate ID ENST00000619216

By format specification, all ids in a GTF/GFF file must be distinct, i.e. even for different features, meaning that a gene and a transcript cannot have the same ID.

How did you get this annotation?

Best
Andrey

@andrewprzh andrewprzh added the input data Issue is caused by input data label Jan 8, 2025
@SuiYue-2308
Copy link
Author

Hi Andrey,

Thank you some much for you reply! I added some gene to the grf. After I remove the duplicate ID, It can work now!
image

Thank you!

Best
Yue

@andrewprzh
Copy link
Collaborator

@SuiYue-2308 I'm glad it worked out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
input data Issue is caused by input data
Projects
None yet
Development

No branches or pull requests

2 participants