-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to add more marker genes #28
Comments
Hi Bruno @mudymudy We are glad you were able to run read2tree successfully. As you mentioned read2tree needs both amino acid and nucleotide to be able to map the sequencing reads onto protein/gene markers. Yes, it is possible to generate gene markers for your set of species using OMA standalone and we described it in our wiki here. Note that Viral dataset is an example and the instruction is valid for eukaryotes. Another point is that the instruction is based the format of Fasta record ID of NCBI refseq. I think uniprot does not provide the nucleotide sequences as mentioned here. You mention that there are some species in OMA browser that you can benefit from. I'm afraid that you can not "directly" use those OGs and add new species to them. I would suggest to run OMA standalone from scratch using all AA sequences of the species (and keep their nt with similar fasta record ID for read2tree) if you are considering <50 species. But if you have more species and you are limited in computation, you can first use this export All-All section to download the all-vs-all comparison of those species in OMA. Then, use description in here under the section Alternatively, you can use the sequencing reads of Please let me which part is not clear and we'll try our best to describe in detail. Best regards, |
Dear Sina, Thank you for your reply! read2tree --standalone_path marker_genes --output_path output --reference --dna_reference data/all_cdna_out.fa I get this error message: 2023-05-22 12:25:52,725 - read2tree.Aligner - INFO - OG1355 with error Sequences must all be the same length This is just one but there are many samples that gives the same message. do you know if this is a critical error or just an error that can be ignored or if there is a way to fix that? Thank you again!! |
You're welcome. |
Thank you for your reply! The first lines of 01_ref_ogs_aa/OG1355.fa:
Then the first lines of 01_ref_ogs_dna/OG1355.fa:
The log file: Thank you so much!! |
Thanks for sharing the log file. I can guess the result of two runs are in the log file (based on the date). It seems that in the first run you didn't use I would suggest to use a new name for output folder with
In order to make the process faster, you may want to use top 50-200 biggest OGs from Hope it helps. Please keep us updated. |
Thank you for your reply. I did ran OMA again to create the marker genes and everything went fine. Then when I run: read2tree --standalone_path marker_genes --output_path output --reference --dna_reference all_cdna_out.fa I get the same error. This is the mp.log of that specific run. Thank you for your time! |
I couldn't make read2tree work with my own generated marker genes, but I followed the other directions and I just added extra fastq files and run it in the multi species mode and that did the job. But I have another question.. when I export marker genes from OMA browser, let's say I select all of the E. coli organisms and then a random one, e.g. Moranella endobia, would that make Moranella the outgroup? Or do I have to specify somehow the outgroup in the read2tree step? Thank you again for your help! |
Sorry I didn't come back you yet, we are discussing the case internally and update you asap. About your question, read2tree does't ask for outgroup information. But, the fact is that the inferred tree (by IQtree, as part of read2tree) is unrooted and the rooting should be done using the outgroup species. You can do it with ete3 package or with phylo.io visually. |
Hi @mudymudy I had a look at the the mplog and your sequences. To me, it seems that there is an inconsistency, i.e. your OG1355 contains only 3 sequences, but in the log it says 8 sequences. From the formatting of the sequences above, it could be that you also put a '>' on the line with the actual sequence. could that be true? It seems rather unlikely as the problem occurs only on a few OGs... Cheers Adrian |
Dear @mudymudy Regarding creating gene markers using OMA, we are keen to investigate the issue further. I'm wondering whether it is possible to share with us the dataset that you are working on. Specifically, the nucleotide and amino acid sequences of Clostridium (I guess) that you used with OMA standalone. Then, we can run OMA and read2tree to find the issue. (You could also email us at sina.majidian gmail.com) Thank you in advance! |
Dear developers,
I've been using r2t successfully when I run it employing marker genes from the OMA db.
My problem is that I want to create a more specific or sensitive tree using Clostridium marker genes that I obtained from the OMA website and some specific Clostridium aminoacidic sequences that I downloaded from UniProt (from C. septicum and C. chauvoei, which are not present in the OMA database).
Is that possible? I did run OMA standalone software with those sequences and it gave me the OG.fa files, which are the OMA group files. But when I use those sequences with r2t, it says that those sequences are not nucleotide sequences.
Then I realised that the OMA marker genes have both AA and nucleotide sequences. Is there any way to include new genomes that are not present in the OMA db? I would like to add some C. septicum and C. chauvoei to my tree but I can't find a way to do it.
Thank you in advance!,
Bruno
The text was updated successfully, but these errors were encountered: