-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
h(simpleErrpr(msg, call)) error #41
Comments
Thank you very much for reaching out. This seems quite a strange behavior indeed. Would it be possible to paste here the top 5 sequences (with header) from the How many sequences are roughly in these files to begin with? The error seems to stem from the fact that in this Type1 vs Type2 search constellation no With very best wishes, |
Hi Hajk! Wow, thank you so much for responding so quickly! So for more context, I'm trying to group the gene families to fill out an all by all comparison of each. From this I should be able to estimate (along with some synteny analyzes of the gene structure) to improve ortholog grouping across bacterial genera. From this, I am planning on running selection tests and potentially population genetic stats if I am able to. All of this was to prevent comparing too diverged paralogs were I could be inflating/deflating the calculated values. By looping all the files, I have been able to skip past the problematic combinations. For example, the below diagram shows 0 values were some of the files failed (ex. Type1 vs. Type2 Rhodocococcus types). To answer your questions, maybe, though all type files are named consistently in terms of style. Here's the top 5 sequences from Rhodococcus_TYpe1.fasta:
And here are the first 5 sequences from Rhodococcus_Type2.fasta:
The number of files for each heavily varies on the type, with the lowest being three sequences and the most being 3000K+. I don't think it is a size constraint entirely as Xanthomonas_Type1.fasta is OK with 3790 sequences. Let me know if there is any other info I can provide that is of use. Thank you so much!! |
Hi Danielle, Thank you very much for your detailed description! Just to rule out the obvious case: Whenever you have comparisons with lower-bound sequences (towards 3 seqs), could it be that there are simply no hits found in these pairwise cases (given the threshold settings you choose for blasting)? If so, I think I don't have an error-capture routine implemented if no hits were found, because in such cases simply no blastresult_*.csv file will be generated. If it's not this obvious case, then I will start digging now. With many thanks and very best wishes, |
Apologies for the slow response. After further investigation, you are right. The low pairwise hits returned no file due to the default parameters requiring an E-value of 1e-5. Once I lower them (to 1e-1), I was able to return a file (which I wanted even if the values were low). Moving forward, I think a simple fix to help future users would be to add to the original error a warning that lack of file may be due to low/no hits and to check if the default parameters are appropriate. Or something along those lines as the current error focuses on file path and write out functionality, which the latter it is but the way it is written, I don't think I would have thought to check the default parameters. Thanks again for your quick help on this!! |
Hi! I just first wanted to say thank you so much for developing this package. It is quite handy and overall greta resource for so many. I've been using the blasts comparison to confirm some previous classification for grouping a protein family. So far it has worked wonderfully. Though for one fasta file, I ran into the follow:
Rhodo_T1_T2 <- as.data.frame(orthologr::blast(query_file = '~/Documents/Mining_MAMPs/Mining_Known_MAMPs/Analyses/Catagorizing_CSPs/CSP_types/Typing_fasta_files/Rhodococcus_Type1.fasta', subject_file = '~/Documents/Mining_MAMPs/Mining_Known_MAMPs/Analyses/Catagorizing_CSPs/CSP_types/Typing_fasta_files/Rhodococcus_Type2.fasta', seq_type = 'protein'))
Running blastp: 2.11.0+ ...
Building a new DB, current time: 06/16/2023 17:03:59
New DB name: /tmp/RtmpvS42vk/_blast_db/blastdb_Rhodococcus_Type2.fasta_protein.fasta
New DB title: blastdb_Rhodococcus_Type2.fasta_protein.fasta
Sequence type: Protein
Deleted existing Protein BLAST database named /tmp/RtmpvS42vk/_blast_db/blastdb_Rhodococcus_Type2.fasta_protein.fasta
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 5 sequences in 0.000640154 seconds.
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': File blastresult_Rhodococcus_Type1.fasta.csv could not be read correctly. Please check the correct path to blastresult_Rhodococcus_Type1.fasta.csv or whether BLAST did write the resulting hit table correctly.
Type1 fast file is fine when comparing against itself or the other two types. Type 2 errors against Type1 and Type3 but runs ok comparing against itself. I've tried reuploading the fasta files but that did nothing. Any ideas? Thanks in advance!
The text was updated successfully, but these errors were encountered: