Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/human/annotation.Libs/human_trna.str does not exsit. Please check it. #13

Open
ahsen1402 opened this issue Sep 28, 2018 · 18 comments
Open

Comments

@ahsen1402
Copy link

Hi,
I downloaded the human library from

wget -O human.tar.gz https://jh.box.com/shared/static/rj7ufy5v15uw7ytsyyrsryw99u7ml82j.gz

command but when running mirge I got the error "/human/annotation.Libs/human_trna.str does not exsit. Please check it.", so I guess the database needs to be updated. Is there any other place i can download it for now (or any other possible missing files).

Thanks in advance.

@mhalushka
Copy link
Owner

Thank you for writing. That needs to be updated, but the person who was making the fix has not completed this. In the mean time, if you send me an email ([email protected]), I can get you the files you need that were left out of the last .gz file.

@mhalushka
Copy link
Owner

missingtrffiles.zip
Actually, I think this includes all of the missing files.

@ahsen1402
Copy link
Author

Hi Mark,

Thanks for sharing this, I will try and let you know if i have any further issues. One more quick question about the file unmapped.csv, are those reads that is aligned to the human genome but have no known annotation? Or are those just rest of the reads that do not appear in the mapped.csv file.

Thanks

@mhalushka
Copy link
Owner

The reads in the unmapped.csv file are all the reads that are not appearing in the mapped.csv file. They have not been aligned to the human genome. I occasionally blast abundant reads from that file and find some align to repeat elements in the human genome. Many do not align to anything. I hope that is helpful.

@ahsen1402
Copy link
Author

Hi Mark,

With the data you have given I was able to continue running but this time run another issue:
A snapshot of error, any idea what might be wrong?

Process Worker-2:
Traceback (most recent call last):
  File "/hpc/users/soft/enter/envs/mirge/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/hpc/users/soft/enter/envs/mirge/lib/python2.7/site-packages/mirge/utils/trim_file.py", line 54, in run
    read = modifier(read)
TypeError: __call__() takes exactly 3 arguments (2 given)
Process Worker-3:
Traceback (most recent call last):
  File "/hpc/users/soft/enter/envs/mirge/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/hpc/users/soft/enter/envs/mirge/lib/python2.7/site-packages/mirge/utils/trim_file.py", line 54, in run
    read = modifier(read)
TypeError: __call__() takes exactly 3 arguments (2 given)
Process Worker-4:

@mhalushka
Copy link
Owner

Yes - someone else had the same error and we think the problem is the python packages. We specifically think the error is if you are using cutadapt v1.18. You need to use cutadapt v1.11. If that doesn't work please make sure all of your python packages exactly match these:
cutadapt(v1.11), biopython(v1.68), numpy(v1.11.3), scipy(v0.17.0), matplotlib(v2.1.1), pandas(v0.21.0), sklearn(v0.18.1), reportlab(v3.3.0) and forgi(v0.20).
We think there is a forward incompatibility problem that we need to solve. Thank you for letting me know and let me know if this solves the problem.

@ahsen1402
Copy link
Author

Hi Mark,

Will try this one quick remark to make your job easier. I am already feeding fastq files that I trimmed priorly so I am not using any adapter option so do you think you still call cutadapt? I can confirm that all packages same I was able to run mirge in the same environment back in May. However, just recently I tried to update it using bioconda which I think started the problem.

@mhalushka
Copy link
Owner

That is interesting. I think miRge still calls cutadapt even if they are trimmed files as it still removes poor quality reads through that function of cutadapt. I'm sorry the update caused the problem and we'll try to figure out what we might have changed (besides leaving out some tRNA files). I know the last version of miRge added a tRF finder which was a significant change/addition to the program. It's possible that caused some incompatibilities that you are now seeing. If the packages fix doesn't work, please let me know.

@mhalushka
Copy link
Owner

Were you able to get it to run, or is it still failing?

@ahsen1402
Copy link
Author

Hi Mark,
Sorry for my late reply unfortunately i got this error this time:

Performing annotation for all of the collasped sequences...
All annotation cycles completed (6837.66 sec).

Summarizing and tabulating results...
Traceback (most recent call last):
  File "/soft/enter/envs/mirge1/bin/miRge2.0", line 11, in <module>
    load_entry_point('mirge==2.0', 'console_scripts', 'miRge2.0')()
  File "/soft/enter/envs/mirge1/lib/python2.7/site-packages/mirge/__main__.py", line 389, in main
    writeDataToCSV(outputdir, annotNameList, sampleList, isomirDiff, a_to_i, logDic, seqDic, mirDic, mirNameSeqDic, mirMergedNameDic, bowtieBinary, genome_index, numCPU, phred64, removedMiRNA_ai_List, spikeIn, gff_output, isomiRContentDic, miRNA_database, trf_output, trfContentDic, trnaStruDic, pre_tRNA_index, duptRNA2UniqueDic, trnaAAanticodonDic, tRNAtrfDic, trfMergedNameDic, trfMergedList)
  File "/soft/enter/envs/mirge1/lib/python2.7/site-packages/mirge/utils/writeDataToCSV.py", line 1080, in writeDataToCSV
    mimatchState, mismatchPosition = detectMismach(contentTmp[0], tRNA_seq, contentTmp[2])
  File "/soft/enter/envs/mirge1/lib/python2.7/site-packages/mirge/utils/writeDataToCSV.py", line 541, in detectMismach
    if target_seq[i] != seqTmp[i]:
IndexError: string index out of range

@mhalushka
Copy link
Owner

I'll think about the problem a bit more tomorrow, but I see the annotation cycles were 6837 seconds which is a really long time. I suspect you aligned multiple fastq files at once, but I wonder if you hit some sort of max buffer in your RAM that caused the write function to fail. If you run only one .fastq file, do you get the same error?

@ahsen1402
Copy link
Author

Hi Mark,

With one sample it finished without errors:

Summarizing and tabulating results...
The number of A-to-I editing sites for is less than 10 so that no heatmap is drawn.
Summary Complete (150.93 sec)
Annotation of miRge2.0 Completed (412.83 sec)

Is the algorithm deterministic upto bowtie assignments that I can run my data in batchs or does it use information from other samples while analyzing a given sample.

@mhalushka
Copy link
Owner

I'm glad it partially worked. If you are just annotating, you could run them all one at a time or in smaller batches and it won't have any negative effects. If you are trying to identify novel miRNAs, you may end up with a more repetitive experience, but still get the correct data. I frequently run up to 10 samples together without any issues. I've done more, but only with smaller fastq files (<2 million reads each).

@chenlx2014
Copy link

Hi Mark,
I can't download the human library from the URL. Can you give me last human library files?

@arunhpatil
Copy link

@chenlx2014 Can you provide me your email address, I will share the zip file for human libraries.

@chenlx2014
Copy link

[email protected]
Thanks for your sharing.

@chenlx2014
Copy link

If my fastq file is phred33, the default format is phred64.
What should I do?

@mhalushka
Copy link
Owner

I think our help page is incorrect. The default is phred 33. Please run the file without calling -phred64 and it should be fine. Let me know if you have any problems with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants