-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when running prepDE.py #238
Comments
It seems this issue appears in the new version. There is no problem for Stringtie 1.3.x. |
@ttgump, that is incorrect (I hope!), prepDE.py should work with version 2.0 just the same, I tested it on the original RNA-Seq protocol data (chrX) and it worked as expected. [ 10/23 EDIT ] @ttgump you were in fact correct, apologies for the above, it turns out that the chrX data set was too simple to expose the problem in stringtie 2.0 which indeed outputs additional STRG entries when the Admittedly it might be tricky to get it running properly, especially when a parent folder is provided as input (instead of a space delimited table with sampleID and GTF_path), as it pulls in other .gtf files in that directory tree that you might not have wanted to use for prepDE.py. For the regular, full stringtie->prepDE pipeline, prepDE is expecting to see a Perhaps in the estimation phase you forgot to provide the For further diagnosing prepDE running issues I'd suggest to take a look at #234 (comment) where I posted a modified version of the prepDE.py script which should help with finding potential issues with the input GTF files (if they were not all generated as expected). Let me know if you encounter any problems running that version on your data. |
Hi, |
@gpertea |
Then that's a problem we'd surely like to fix. Stringtie v2 did improve assembly accuracy in our tests (to answer @andy3404's question), but there were sections of the code which were a complete rewrite compared to v1.3 (some of them due to adding support for long reads) and that accounts for the differences in the output and occasionally, it seems that the output when But that should not happen and we would like to fix it. It would be very helpful if any of you can provide us with such an example, where running stringtie v2 with As an example: @ttgump, before that error message you should have seen the sample name printed there, right? (it would be helpful if you had shown the entire prepDE message here) |
Yes. The error happens on the second sample. How could I share the bam file with you? It is several Gbs. Thanks.
… On Oct 16, 2019, at 8:07 AM, Geo Pertea ***@***.***> wrote:
Then that's a problem we'd surely like to fix. Stringtie v2 did improve assembly accuracy in our tests (to answer @andy3404's question), but there were sections of the code which were a complete rewrite compared to v1.3 (some of them due to adding support for long reads) and that accounts for the differences in the output and occasionally, it seems that the output when -e option is used (which means "only estimate the abundance, do not assemble"), in v2 may sometimes "drop" (i.e. not report) some redundant (or uncovered) reference transcripts which in turn makes prepDE.py fail.
But that should not happen and we would like to fix it. It would be very helpful if any of you can provide us with such an example, where running stringtie v2 with -e option seems to have dropped (not report) a transcript from the merge file that was given with -G option.
As an example: @ttgump, before that error message you should have seen the sample name printed there, right? (it would be helpful if you had shown the entire prepDE message here)
Was that the 2nd sample which was printed there as being processed by prepDE ? Because that means that the first sample is actually the one missing that ENST00000647043.1, so the problem would be with the stringtie -e run for that first sample (even though prepDE does not complain about that one), because that was the one dropping ENST00000647043.1 from the output of stringtie -e, even though that transcript is in the -G reference file. Do you think you can identify that first sample and perhaps share with me the corresponding .bam file and the file that you used with -G when you ran stringtie -e on that sample? That would allow us to properly debug this issue.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
i met same question using prepDE.py updated 8 days ago. Here is the error printed in the screen. Error: could not locate transcript STRG.2470.1 entry for sample neg2 maybe the length of gtfs are not equal to each other? |
@Gin-Wang did you run the new prepDE.py with the @ttgump and others willing/able to share such BAM file, could you upload the file to Google Drive or a similar service which allows sharing the download link (so you can email/share the download link with [email protected]). Alternatively, send me an email so I can send you instructions for uploading the files to our FTP server, though lately we've had some problems with large uploads on that ftp server, but we could give it a try. |
@gpertea thanks for your answer, i tried it with "-v" just now and met the similar question and here is the log: python2 6.prepDE.py -i 5.ballgown -v
i'd like to share with you the files as you mentioned above, but it may take a while. Actually it printed similar error when i put my pos1 and neg1 files and those bam files are smaller than others, so i will upload those files. |
Thank you for your help with this problem. From that log it looks like the files that I would need should be shown in the first line of the file: To repeat and clarify: I would NOT need multiple .bam files, just one should suffice (the one for the neg1 sample in your example here, the one for which prepDE did NOT fail). |
@gpertea today I ran the new script and met the simliar problem like before. Attach the code below: #!/bin/sh And the error is :
Then I view the prepDE.log and nothing in this file. By the way, when I ran the old verison of this script, the error was same as I ran today. Could you tell me reason and solve it? |
Thank you so much @Gin-Wang I took a look at the data you uploaded and I can confirm now that the problem is real and correctly reported by users in this thread (and in #234), stringtie 2.0 with the We will fix this |
i'm glad that it may help, and thanks again for the great software stringtie and your work |
The main author fixed this issue in v2.0.4 which was just released (see https://github.com/gpertea/stringtie/releases/tag/v2.0.4). Now stringtie v2.0.4 should behave as expected with the |
Hi, I thought I'd add my 2 cents to this discussion since it's still listed as "open". I was getting the same error messages that everyone has been showing above. As suggested, I counted lines in each of my ballgown/sample/sample.gtf files using wc -l. Each of these files should have had the same number of lines as the "stringtie_merged.gtf" file created during stringtie --merge, but did not. I was also following the protocol outlined by Pertea et al., 2016, which used stringtie v1.2.2 for their analysis. However, I am using two versions of stringtie: v1.3.5, and v2.0.3. BUT, for both of the stringtie versions I have been using, this command only works as expected if the elements are shuffled: It seems that following the published command line causes the newer versions of the program to miss some element of the input, which allows the output to not reply on -e as it should. Even though the man/help page of later versions of stringtie clearly shows the proper organization of input files being different than that used by Pertea et al., I think this is not something people would typically think to look at if they're just following the published protocol verbatim. @gpertea do you think you could build in an error that tells people the organization of their input order is wrong if their ballgown directory file line counts don't match the merged-files line count? I am super happy that this program exists at all, and I also very happy with the amount of feedback and help available on this github issues page. Please keep up the good work, and I'd love to know if my solution is consistent on your end. |
@MarinaMann, thank you for the nice words but that problem is actually unrelated to the Not sure if it's visible in your browser, but do you notice any difference between these two lines below?
(in my browser it looks like the dashes in the first line are a bit longer and closer to the letter that follows them, compared to those on the second line). There seems to have been a editing/typographical (?) error which affected our protocol paper when it was submitted for publishing, where sometimes the plain 'dash' or 'minus' character ('-') in our command lines got unexpectedly replaced with the "em dash" typographical symbol ('–'). I guess some word processing/publishing software like Microsoft Word even makes this kind of substitution automatically in some contexts. The issue was previously reported when users copied gffcompare commands from the protocol paper (see gpertea/gffcompare#3) but now I see it affected other command lines too.. |
Hello, I'm finding the same issue when using version 2.2.1., the obtained GTF files when using -e -B had less transcripts than the file use as -G and give problems when using prepDE.py . However, version 2.1.4 seems to work just fine. |
@nurie05, could you please tell me:
|
@gpertea regarding you questions:
|
Hello, I am getting the same problem when trying to use prepDE.py3 on assembly with -L. My error is: My estimation command is Could you please offer suggestions for how to troubleshoot please? I have successfully used prepDE.py3 on short read data from the same sample without an issue. Is this error specific to -L option? |
Hi @gpertea, I just want to update that I figured out the issue with my problem. It turns out, when I used stringtie 2.2.1 from git clone on the github page, it installed a version that is not fully updated. When I reinstalled from your website, or downloaded directly from the latest release, the error I reported above is resolved. It is slightly misleading that both ways of installations gave stringtie 2.2.1. Hopefully this can help others solve their errors too. |
Thank you @MoonyScotch for the update - actually the github version is a couple of commits ahead of the "release" 2.2.1 version, and those changes made the |
Hi, I am using the v2.2.3, and the problem persists. Should I switch back to 2.2.1? I m using only short reads. thanks, |
@smallfishcui indeed unfortunately I can confirm that this issue is back in v2.2.3 -- when |
I get the following error when running prepDE.py on stringtie estimate output:
I only get this for one sample out of fifteen. I run the HISAT2 > Stringtie workflow as described in Pertea et al. (2016). I want to feed this into edgeR. Is there another option in case I can not resolve this problem?
Additional question: What is the difference between
MSTRG
andSTRG
?The text was updated successfully, but these errors were encountered: