-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No tree generated after running multiple species mode #8
Comments
Hi Willem, were the file |
Hi Adrian, Yes both files were created. The concat files are also created per input species. There are also text files called "input_species" I am happy to share any further files if this helps. All the best, |
Hi Willem, sorry for the slow progress on this. I had an initial look at the problem, but couldn't spend too much time. At first look, it seems that the subsampling of your input species for MD.fq.gz produces too few hits to place the sequences. For my test run, the produced alignment contains no sequence trace of the MD species at all. I'm not sure how big the original coverage of the ENA dataset is, but maybe the subsampling of a very low coverage, but huge genome could be an issue, as we simply have too few hits. We try to verify this hypothesis further, but maybe it would make sense to run read2tree with the full reads set in the meanwhile to see if it produces some hits. Best wishes |
Hi Adrian Thanks for getting back to me. The same error popped up: Attached here is the mplog.log Hope this helps in identifying the issue. All the best and have a good weekend! |
Dear Willem, Sorry for the late reply. I managed to run your example successfully on our cluster after adapting the code that I just pushed. Please try it again with the new code. I basically used your marker genes and ran the following lines:
For each run read set you add you should basically see X_all_cov.tx files that should show to which OG and sequence a successful mapping took place. Please let me know if there are still issues. Best, |
Dear David, Thank you for getting back to me. I re-ran the whole pipeline using the example command you mentioned above.
The
Below is a copy of the log as observed within the terminal.
All the best, |
Hi David and Willem,
We are also using our own reference dataset so chances are the problem might related to that... we have some promising looking output in the sample-specific fa files(!) but no alignments or trees are generated. |
Hi David @combosch Ps. I just noticed that you are using own reference dataset. Could you describe it in more details? Did you run OMA standalone? read2tree requires very specific formatting of input fasta and fna files, some of which are described here. e.g. first five letter of fasta record should be unique. The nucleotide sequence should only include coding sequences.
I guess I described many things at once. It might be better to discuss step by step and make sure each step works as intended. |
Thanks for your fast reply Sina!!
|
You're welcome. Right, when you have one species, uniqueness of the species code should be fine. I assume you want use multiple species mode, several sequencing read dataset of different samples? otherwise one reference and one sample are not enough as a tree should have at least 3 leaves. Regarding the reference (gene marker folder and the dna_ref.fa files), Read2tree uses the exact order of the codons, meaning that every 3 letters in dna should correspond to one letter in aa (and vice versa). And I can see these lines in the log
It seems that read2tree couldn't append the "assembled" sequence to the reference gene. Read2tree does that at both levels of dna and amino-acid. Could you check the output folder whether you have Best, |
Hi Sina,
Thanks a lot for your help!! |
Hi David. sorry I couldn't respond as I was travelling. Please let us know if you had some progress on this. |
Hi Sina, no problem! |
Dear Authors, dear Sina and Adrian,
Sorry to contact you once more in this short timeframe.
After the previous issue was fixed, I re-installed read2tree and successfully ran the test dataset.
Thereafter, I re-ran my previously attempted command with 3 additional species.
Aligning now seems to run well, however during the merge / tree inference I get the following error:
Once more, the file mentioned
/scratch/iqtree6xdi1c5c/tmp_output.treefile
was not to be found.The full command used was:
Where the reads of specie 2 to 4 are similarly generated and subsampled as for the previously shared MB.fq
The nohup file is added here
nohup.txt
Since the mplog file was too big, I have uploaded it to wetransfer:
https://we.tl/t-UF9I5I3dqK
Kind regards,
Willem
The text was updated successfully, but these errors were encountered: