You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I used gffcompare to merge multiple assemble results from different samples. The code is listed below:
gffcompare -T -S $gffcompare_inputs -o $output
I didn't offer a reference annotation cause I just want to merge assemble results from different samples.But I got two questions.
Question 1.
How does gffcompare cluster multiple samples without a reference annotation? There is only description of duplication removing in your manual[https://ccb.jhu.edu/software/stringtie/gffcompare.shtml] and your paper. In my results, I saw some transcripts with contained_in tag. You said you want to remain alternative start site. But there are some transcripts with the same tss_id and one of them with contained_in tag. It is so wired. Below is an example:
Question 2.
After I found the problem above, I checked the tss_id in my results. I followed 3 steps:
step1: Group transcripts with tss_id
step2: Calculate the max distance of start site among transcripts with the same tss_id
step3: Get the distribution of distance
Below are 2 figures of my checking results. Although your default setting of parameter -d is 100, there are still transcripts using the same tss_id dist more than 100bp.
Below is a pie plot and Category means max distance=0, >0 & <=100 and >100
Below is a histogram of distance, I removed distance=0 firstly.
Thank you so much to solve my question and looking forward to your reply!
The text was updated successfully, but these errors were encountered:
Hi, I have figured out why the two transcripts have the same tssid, but the transcription start sites are far apart. You set a cutoff to cluster transcript start site(-d with default 100). But that only applies to positive strand genes, negative strand genes don't adhere to that standard.
Hi, I used gffcompare to merge multiple assemble results from different samples. The code is listed below:
I didn't offer a reference annotation cause I just want to merge assemble results from different samples.But I got two questions.
Question 1.
How does gffcompare cluster multiple samples without a reference annotation? There is only description of duplication removing in your manual[https://ccb.jhu.edu/software/stringtie/gffcompare.shtml] and your paper. In my results, I saw some transcripts with
contained_in
tag. You said you want to remain alternative start site. But there are some transcripts with the sametss_id
and one of them withcontained_in
tag. It is so wired. Below is an example:Question 2.
After I found the problem above, I checked the
tss_id
in my results. I followed 3 steps:step1: Group transcripts with tss_id
step2: Calculate the max distance of start site among transcripts with the same tss_id
step3: Get the distribution of distance
Below are 2 figures of my checking results. Although your default setting of parameter
-d
is 100, there are still transcripts using the sametss_id
dist more than 100bp.Below is a pie plot and
Category
means max distance=0
,>0 & <=100
and>100
Below is a histogram of distance, I removed distance=0 firstly.
Thank you so much to solve my question and looking forward to your reply!
The text was updated successfully, but these errors were encountered: