Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preDE.py stops at the second file #234

Closed
AishMandya opened this issue Aug 26, 2019 · 24 comments
Closed

preDE.py stops at the second file #234

AishMandya opened this issue Aug 26, 2019 · 24 comments

Comments

@AishMandya
Copy link

AishMandya commented Aug 26, 2019

@tsznxx @nongbaoting @gpertea
Hi I have modified the sample folder to
label space file path
label space file path
but the error persists
Each gtf file is inside a subdirectory of the ballgown directory generated by the stringtie -B -e

$ prepDE.py -i sample_1st.txt
output:
0 A1_S1
1 A2_S2
Traceback (most recent call last):
File "prepDE.py", line 257, in
geneDict[geneIDs[i]][s[0]]+=v[s[0]]
KeyError: 'A2_S2'
similar to issue #232

@lelesama
Copy link

Hi,I have the same question ,do you have solved?

@AishMandya
Copy link
Author

No, not yet. IT seems to stop working at the second iteration, no matter which file it is. So it may be a glitch in the code or the way I have used stringtie to generate the gtf files. Also, I don't fully understand how the code works so it's definitely inconclusive

@lelesama
Copy link

Hi,AishMandya,it actually made me crazy !but when I use older version ,it works! maybe you can try,hope it will help you.

@e-lerat
Copy link

e-lerat commented Sep 12, 2019

Hi everyone
Sorry, I won't help. I have the exact same problem. I even run again everything since at first I thought that I didn't have the right genome gtf file.
Unfortunately, it still does not work. If you get the answer, I am really interested!

Emmanuelle

@AishMandya
Copy link
Author

AishMandya commented Sep 12, 2019 via email

@AishMandya
Copy link
Author

AishMandya commented Sep 12, 2019 via email

@SofiaZhangtj
Copy link

Did you use the --merge during stringtie step?

@AishMandya
Copy link
Author

AishMandya commented Oct 4, 2019 via email

@SofiaZhangtj
Copy link

SofiaZhangtj commented Oct 5, 2019

Hi Aish,
Thank you very much for your kind answer. I think I finally found my problem. This prepDE.py script is supposed to based on version 1.2, and now the software has been updated many times but the scripts have not. I changed my stringtie version from 2 to 1.3.3, then the script works.

@coreyscipione
Copy link

coreyscipione commented Oct 5, 2019

Hi @SofiaZhangtj and @AishMandya ,
I am using stringtie 1.3.3 with the prepDE.py, to generate files for DESeq2 and I keep clogging up at the error:
Traceback (most recent call last):
File "/cluster/home/cscipion/scripts/prepDE", line 257, in
geneDict[geneIDs[i]][s[0]]+=v[s[0]]
KeyError: 'B1pr_S13_L002'

I use have successfully used the sample 'B1pr_S13_L002' in several other comparisons, but this set of samples is rejecting it for some reason.

Any thoughts? Maybe @gpertea can help.

@SofiaZhangtj
Copy link

Hi @SofiaZhangtj and @AishMandya ,
I am using stringtie 1.3.3 with the prepDE.py, to generate files for DESeq2 and I keep clogging up at the error:
Traceback (most recent call last):
File "/cluster/home/cscipion/scripts/prepDE", line 257, in
geneDict[geneIDs[i]][s[0]]+=v[s[0]]
KeyError: 'B1pr_S13_L002'

I use have successfully used the sample 'B1pr_S13_L002' in several other comparisons, but this set of samples is rejecting it for some reason.

Any thoughts? Maybe @gpertea can help.

Hi,
I found that even the last version 1.3.6 works for me. I think met a same problem with yours at the beginning. That time I didn't use the "-e" parameter in string-tie.

@gpertea
Copy link
Owner

gpertea commented Oct 6, 2019

I'll investigate the possibility that some changes in Stringtie v2 may have affected the compatibility with prepDE.py, but in the past there were a lot of "errors" alleged by users of prepDE.py which were mainly caused by an incorrect usage of the script.
To reiterate and clarify: prepDE.py can only be used on a set of stringtie GTF outputs if stringtie was run, for all those outputs:

  • with the -e option
  • with the same file for the -G option.

Also, make sure that no other GTF files (like the reference annotation file) are present in those sub-directories, only the stringtie output GTF files should be found there, as the default mode of operation for prepDE is to scan all the sub-directories there for .gtf files which are all expected to have been produced by stringtie by following the requirements above (-e option, same -G file).

@SofiaZhangtj
Copy link

SofiaZhangtj commented Oct 7, 2019

@gpertea
Hi Pertea,
Thank you for your reply. The output gtf files of stringtie v2 have different lines, but in previous vision it was the same. But the t_data.ctab files remain same as the older version. I think that's why prepDE.py doesn't work for the 2.0 version for my case.

lines number of the older version (GTF file) (Two pairs of identical sequencing data )
1
2
lines number of the new version :
3
4

Any suggestions will be helpful. Thank you.

@coreyscipione
Copy link

coreyscipione commented Oct 11, 2019

I am using the -e and -B option, and there is only one .gtf in the directory.
The oddity is really that I have 23 samples (7 triplicates + 2 others). Samples 1-3 have been compared against 4-6, 7-9, 16-18 with no issues. When I compare 1-3 vs 10-12 is the only time I get the previously mentioned error.
In this case I am sure everything is set up correctly, I am tempted to think that it’s a stringtie issue and not a syntax problem. Any further suggestions? Thank you all!

error is
Traceback (most recent call last):
File "/cluster/home/cscipion/scripts/prepDE", line 257, in
geneDict[geneIDs[i]][s[0]]+=v[s[0]]
KeyError: 'B1pr_S13_L002'

@gpertea
Copy link
Owner

gpertea commented Oct 14, 2019

I've added some consistency checking to the prepDE.py script when reading the input data, it should catch some common usage errors.
Could you please download the latest prepDE.py script, place it in your working directory, make sure it's executable and then run it again with the same parameters you used before but this time add the -v option, capturing the output in a file, with a command like this:

./prepDE.py (your parameters here) -v 2>&1 | tee prepDE.log

(Use the link above to get this updated script, or you can also download the attached prepDE.py.gz, copy it into your working directory, gunzip it and make it executable, then make sure you run it with ./prepDE.py)

You can then show the prepDE.log here or email it to me.

@AishMandya
Copy link
Author

AishMandya commented Oct 14, 2019 via email

@nalcala
Copy link

nalcala commented Oct 18, 2019

Hi,

I am having the same issue (error at second file). The log file using the "new" script is:
processing sample S001_T from file ./S001_T/S001_T_ST.gtf

processing sample S002_T2 from file ./S002_T2/S002_T2_ST.gtf
Error: could not locate transcript S001_T.20797.1 entry for sample S002_T2
Traceback (most recent call last):
File "prepDE.py", line 283, in
geneDict[geneIDs[i]][s[0]]+=v[s[0]]
KeyError: 'S002_T2'

I don't really understand because the previous line in the script
geneDict[geneIDs[i]].setdefault(s[0],0)
should have created a key for s[0]...

Thanks!

@nourelislam
Copy link

nourelislam commented Oct 27, 2019

I'll investigate the possibility that some changes in Stringtie v2 may have affected the compatibility with prepDE.py, but in the past there were a lot of "errors" alleged by users of prepDE.py which were mainly caused by an incorrect usage of the script.
To reiterate and clarify: prepDE.py can only be used on a set of stringtie GTF outputs if stringtie was run, for all those outputs:

  • with the -e option
  • with the same file for the -G option.

Also, make sure that no other GTF files (like the reference annotation file) are present in those sub-directories, only the stringtie output GTF files should be found there, as the default mode of operation for prepDE is to scan all the sub-directories there for .gtf files which are all expected to have been produced by stringtie by following the requirements above (-e option, same -G file).

Is it mandatory to use -e option?? As a matter of fact, I am working on detecting the novel splice sites so I should disregards -e option @gpertea

@gpertea
Copy link
Owner

gpertea commented Oct 29, 2019

This (the OP) seems to be the same issue with #232, so it should be fixed in v2.0.4 release.

@gpertea
Copy link
Owner

gpertea commented Oct 29, 2019

Also same with #238, I'll leave only that issue open for a while, for user confirmation that the problem was fixed in v2.0.4

@nalcala
Copy link

nalcala commented Jan 17, 2020

@gpertea I am all good now with the new versions, on my side you can close the issue. Thanks a lot!

@ElzaFosneca
Copy link

I've added some consistency checking to the prepDE.py script when reading the input data, it should catch some common usage errors. Could you please download the latest prepDE.py script, place it in your working directory, make sure it's executable and then run it again with the same parameters you used before but this time add the -v option, capturing the output in a file, with a command like this:

./prepDE.py (your parameters here) -v 2>&1 | tee prepDE.log

(Use the link above to get this updated script, or you can also download the attached prepDE.py.gz, copy it into your working directory, gunzip it and make it executable, then make sure you run it with ./prepDE.py)

You can then show the prepDE.log here or email it to me.
Hi @gpertea,

I am running version v.2.2.1 and I'm getting the same error.

prepDE.log

@RJEGR
Copy link

RJEGR commented Mar 17, 2023

Hi everyone,

Same error for StringTie v2.2.1,

By using the prepDE.py version than @gpertea made the diagnosing is:

prepDE.py -i samples.txt -v 2>&1 | tee prepDE.log

processing sample SRR8956796 from file /home/rvazquez/RNA_SEQ_ANALYSIS/ASSEMBLY/STRINGTIE/QUANTIFICATION/DENOVO_MODE/SRR8956796_eB_dir/SRR8956796_eB.gtf
processing sample SRR8956797 from file /home/rvazquez/RNA_SEQ_ANALYSIS/ASSEMBLY/STRINGTIE/QUANTIFICATION/DENOVO_MODE/SRR8956797_eB_dir/SRR8956797_eB.gtf
Error: could not locate transcript MSTRG.31643.1 entry for sample SRR8956797
Traceback (most recent call last):
File "/home/rvazquez/RNA_SEQ_ANALYSIS/stringtie/prepDE.py", line 284, in
geneDict.setdefault(geneIDs[i],{}) #gene_id
KeyError: 'MSTRG.31643.1'

Although this issue is closed, no one commented the StringTie v2.2.1 problem is solved using the prepDE.py3

prepDE.py3 -i samples.txt -v 2>&1
...
..writing transcript_count_matrix.csv
..writing gene_count_matrix.csv
All done.

@jubiology
Copy link

jubiology commented May 4, 2023

I also encounter the same problems with all tested versions of Stringtie. When I use the prepDE.py3 script, it gives me a very weird gene count matrix, where samples 2-x show massive zero inflation while sample 1 looks normal. Also the last line does not look like expected:

image

If anybody has any hints on how to solve this please let me know.

Edit: The error disappeared when I ran stringtie without the -x option. Not sure why this option caused the error, but now everything works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests