Inconsistency among read_assignments.tsv, .transcript_counts.tsv, and *.gene_counts.tsv #290

airichli · 2025-02-19T22:40:57Z

Hi,

Thanks for developing this nice tool. I met one issue that the count is inconsistent among files as list in the title.
My command line is:
./IsoQuant-3.5.0/isoquant.py -d nanopore --stranded forward --fastq fastq1 fastq2 fastq3 --reference human_genome.fa --genedb
human_genome.gtf --complete_genedb --output output --prefix human --labels a b c -t 10 --model_construction_strategy default_ont --clean_start --matching_strategy default --splice_correction_strategy default_ont --model_construction_strategy default_ont --transcript_quantification unique_only --gene_quantification unique_only

Take gene ENSG00000290425 (its isoforms include ENST00000652586 and ENST00000617243) as an example. I extract all the information related with ENSG00000290425 from read_assignments.txt file as below.

ENSG00000290425.read_assignments.txt

According to *.transcript_counts.tsv, the transcript counts for ENST00000652586 and ENST00000617243 are 36 and 4, respectively. According to *.gene_counts.tsv, the gene count for ENSG00000290425 is 146.
However, I got different values when extract from the read_assighments.txt file as below:
grep ENST00000652586 ENSG00000290425.read_assignments.txt|awk '$6~/uniq/{print}'|wc -l (this results in 19)
grep ENST00000617243 ENSG00000290425.read_assignments.txt|awk '$6~/uniq/{print}'|wc -l (this results in 3)
grep ENSG00000290425 ENSG00000290425.read_assignments.txt|awk '$6~/uniq/{print}'|wc -l (this results in 35)

Very confusing. Actually, I checked 1490 isoforms, of which 1279 isoforms have consistent count, 211 isoforms have inconsistent count.

It would be great if isoquant could provide the exact read ids that corresponds to *.transcript_counts.tsv and *.gene_counts.tsv.

Looking forward to hearing from you. Thanks.

Best,
Aifu

andrewprzh · 2025-02-20T00:00:21Z

Dear @airichli

This inconsistency is expected.

To add a +1 to an isoform abundance, IsoQuant requires a read to be assigned uniquely, meaning that it is consistent with the annotation and there is no ambiguity in the assignment. However, to add +1 to a gene count, a read can be assigned to 2 or more isoforms (of this gene) ambiguously. In other words, not all reads that are unambiguously assigned to a gene can be unambiguously assigned to a transcript.
This explains inconsistency between gene counts and isoform counts.

Inconsistency between read assignments and isoforms counts are likely caused by the additional filtering imposed on counts. For example, to report a spliced isoform, it must have at least one spliced read assigned to it.

P.S. I suggest you to update you IsoQuant to the latest 3.6.3 version, there were some quantification improvements recently.

Best
Andrey

airichli · 2025-02-20T01:40:52Z

Dear Andrey, thank you so much for your timely reply, I really appreciate it.

I would like to check:
Assuming the *.transcript_counts.tsv, and *.gene_counts.tsv files (the quantification mode is unique-only, so only uniquely assigned reads are counted) are based on *read_assignments.tsv. The unique reads for ENST00000652586 in *read_assignments.tsv are only 19, but the count for ENST00000652586 in *.transcript_counts.tsv file is 36 (if there exists filtering, the number should less than 19). Does *.transcript_counts.tsv file includes non-unique reads as well?

My purpose is just want to know the 36 read IDs for ENST00000652586 in *.transcript_counts.tsv, it seems hard for me to deduce from *read_assignments.tsv. Is there any way to know the corresponding read IDs for the count in *.transcript_counts.tsv?

I will try the latest 3.6.3 version. Thanks.

Best,
Aifu

airichli changed the title ~~Inconsistency between *read_assignments.tsv, *.transcript_counts.tsv, and *.gene_counts.tsv~~ Inconsistency among *read_assignments.tsv, *.transcript_counts.tsv, and *.gene_counts.tsv Feb 19, 2025

andrewprzh added the question Further information is requested label Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistency among read_assignments.tsv, .transcript_counts.tsv, and *.gene_counts.tsv #290

Inconsistency among read_assignments.tsv, .transcript_counts.tsv, and *.gene_counts.tsv #290

airichli commented Feb 19, 2025 •

edited

Loading

andrewprzh commented Feb 20, 2025

airichli commented Feb 20, 2025 •

edited

Loading

Inconsistency among *read_assignments.tsv, *.transcript_counts.tsv, and *.gene_counts.tsv #290

Inconsistency among *read_assignments.tsv, *.transcript_counts.tsv, and *.gene_counts.tsv #290

Comments

airichli commented Feb 19, 2025 • edited Loading

andrewprzh commented Feb 20, 2025

airichli commented Feb 20, 2025 • edited Loading

Inconsistency among read_assignments.tsv, .transcript_counts.tsv, and *.gene_counts.tsv #290

Inconsistency among read_assignments.tsv, .transcript_counts.tsv, and *.gene_counts.tsv #290

airichli commented Feb 19, 2025 •

edited

Loading

airichli commented Feb 20, 2025 •

edited

Loading