You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for developing this nice tool. I met one issue that the count is inconsistent among files as list in the title.
My command line is:
./IsoQuant-3.5.0/isoquant.py -d nanopore --stranded forward --fastq fastq1 fastq2 fastq3 --reference human_genome.fa --genedb
human_genome.gtf --complete_genedb --output output --prefix human --labels a b c -t 10 --model_construction_strategy default_ont --clean_start --matching_strategy default --splice_correction_strategy default_ont --model_construction_strategy default_ont --transcript_quantification unique_only --gene_quantification unique_only
Take gene ENSG00000290425 (its isoforms include ENST00000652586 and ENST00000617243) as an example. I extract all the information related with ENSG00000290425 from read_assignments.txt file as below.
According to *.transcript_counts.tsv, the transcript counts for ENST00000652586 and ENST00000617243 are 36 and 4, respectively. According to *.gene_counts.tsv, the gene count for ENSG00000290425 is 146.
However, I got different values when extract from the read_assighments.txt file as below:
grep ENST00000652586 ENSG00000290425.read_assignments.txt|awk '$6~/uniq/{print}'|wc -l (this results in 19)
grep ENST00000617243 ENSG00000290425.read_assignments.txt|awk '$6~/uniq/{print}'|wc -l (this results in 3)
grep ENSG00000290425 ENSG00000290425.read_assignments.txt|awk '$6~/uniq/{print}'|wc -l (this results in 35)
Very confusing. Actually, I checked 1490 isoforms, of which 1279 isoforms have consistent count, 211 isoforms have inconsistent count.
It would be great if isoquant could provide the exact read ids that corresponds to *.transcript_counts.tsv and *.gene_counts.tsv.
Looking forward to hearing from you. Thanks.
Best,
Aifu
The text was updated successfully, but these errors were encountered:
airichli
changed the title
Inconsistency between *read_assignments.tsv, *.transcript_counts.tsv, and *.gene_counts.tsv
Inconsistency among *read_assignments.tsv, *.transcript_counts.tsv, and *.gene_counts.tsv
Feb 19, 2025
To add a +1 to an isoform abundance, IsoQuant requires a read to be assigned uniquely, meaning that it is consistent with the annotation and there is no ambiguity in the assignment. However, to add +1 to a gene count, a read can be assigned to 2 or more isoforms (of this gene) ambiguously. In other words, not all reads that are unambiguously assigned to a gene can be unambiguously assigned to a transcript.
This explains inconsistency between gene counts and isoform counts.
Inconsistency between read assignments and isoforms counts are likely caused by the additional filtering imposed on counts. For example, to report a spliced isoform, it must have at least one spliced read assigned to it.
P.S. I suggest you to update you IsoQuant to the latest 3.6.3 version, there were some quantification improvements recently.
Dear Andrey, thank you so much for your timely reply, I really appreciate it.
I would like to check:
Assuming the *.transcript_counts.tsv, and *.gene_counts.tsv files (the quantification mode is unique-only, so only uniquely assigned reads are counted) are based on *read_assignments.tsv. The unique reads for ENST00000652586 in *read_assignments.tsv are only 19, but the count for ENST00000652586 in *.transcript_counts.tsv file is 36 (if there exists filtering, the number should less than 19). Does *.transcript_counts.tsv file includes non-unique reads as well?
My purpose is just want to know the 36 read IDs for ENST00000652586 in *.transcript_counts.tsv, it seems hard for me to deduce from *read_assignments.tsv. Is there any way to know the corresponding read IDs for the count in *.transcript_counts.tsv?
Hi,
Thanks for developing this nice tool. I met one issue that the count is inconsistent among files as list in the title.
My command line is:
./IsoQuant-3.5.0/isoquant.py -d nanopore --stranded forward --fastq fastq1 fastq2 fastq3 --reference human_genome.fa --genedb
human_genome.gtf --complete_genedb --output output --prefix human --labels a b c -t 10 --model_construction_strategy default_ont --clean_start --matching_strategy default --splice_correction_strategy default_ont --model_construction_strategy default_ont --transcript_quantification unique_only --gene_quantification unique_only
Take gene ENSG00000290425 (its isoforms include ENST00000652586 and ENST00000617243) as an example. I extract all the information related with ENSG00000290425 from read_assignments.txt file as below.
ENSG00000290425.read_assignments.txt
According to *.transcript_counts.tsv, the transcript counts for ENST00000652586 and ENST00000617243 are 36 and 4, respectively. According to *.gene_counts.tsv, the gene count for ENSG00000290425 is 146.
However, I got different values when extract from the read_assighments.txt file as below:
grep ENST00000652586 ENSG00000290425.read_assignments.txt|awk '$6~/uniq/{print}'|wc -l (this results in 19)
grep ENST00000617243 ENSG00000290425.read_assignments.txt|awk '$6~/uniq/{print}'|wc -l (this results in 3)
grep ENSG00000290425 ENSG00000290425.read_assignments.txt|awk '$6~/uniq/{print}'|wc -l (this results in 35)
Very confusing. Actually, I checked 1490 isoforms, of which 1279 isoforms have consistent count, 211 isoforms have inconsistent count.
It would be great if isoquant could provide the exact read ids that corresponds to *.transcript_counts.tsv and *.gene_counts.tsv.
Looking forward to hearing from you. Thanks.
Best,
Aifu
The text was updated successfully, but these errors were encountered: