1. How to cluster transcripts from multiple query gtf files? 2. Something wired about tss_id #99

dudududu12138 · 2024-12-25T08:39:15Z

Hi, I used gffcompare to merge multiple assemble results from different samples. The code is listed below:

gffcompare -T -S $gffcompare_inputs -o $output

I didn't offer a reference annotation cause I just want to merge assemble results from different samples.But I got two questions.

Question 1.

How does gffcompare cluster multiple samples without a reference annotation? There is only description of duplication removing in your manual[https://ccb.jhu.edu/software/stringtie/gffcompare.shtml] and your paper. In my results, I saw some transcripts with contained_in tag. You said you want to remain alternative start site. But there are some transcripts with the same tss_id and one of them with contained_in tag. It is so wired. Below is an example:

Question 2.

After I found the problem above, I checked the tss_id in my results. I followed 3 steps:

step1: Group transcripts with tss_id
step2: Calculate the max distance of start site among transcripts with the same tss_id
step3: Get the distribution of distance

Below are 2 figures of my checking results. Although your default setting of parameter -d is 100, there are still transcripts using the same tss_id dist more than 100bp.
Below is a pie plot and Category means max distance=0, >0 & <=100 and >100

Below is a histogram of distance, I removed distance=0 firstly.

Thank you so much to solve my question and looking forward to your reply!

The text was updated successfully, but these errors were encountered:

dudududu12138 · 2024-12-25T11:24:41Z

Hi, I have figured out why the two transcripts have the same tssid, but the transcription start sites are far apart. You set a cutoff to cluster transcript start site(-d with default 100). But that only applies to positive strand genes, negative strand genes don't adhere to that standard.

gpertea self-assigned this Dec 25, 2024

gpertea added the to investigate label Dec 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1. How to cluster transcripts from multiple query gtf files? 2. Something wired about tss_id #99

1. How to cluster transcripts from multiple query gtf files? 2. Something wired about tss_id #99

dudududu12138 commented Dec 25, 2024 •

edited

Loading

dudududu12138 commented Dec 25, 2024

1. How to cluster transcripts from multiple query gtf files? 2. Something wired about tss_id #99

1. How to cluster transcripts from multiple query gtf files? 2. Something wired about tss_id #99

Comments

dudududu12138 commented Dec 25, 2024 • edited Loading

Question 1.

Question 2.

dudududu12138 commented Dec 25, 2024

dudududu12138 commented Dec 25, 2024 •

edited

Loading