You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used Progressive Cactus to align whole genomes of 8 species. Then, using cactus-hal2maf with the --bedRanges option I was able to output an alignment including only genes of interest. When I inspect the MAF file however, the only names attached to sequences are the species.chromosome i.e., Mus_musculus.NC_000072.7. This has become problematic in downstream applications because some of the different genes are on the same chromosome and therefore the sequences have the same name. So even things like AlignIO from Biopython can't read the MAF file. Is there any way around this issue using the tools provided by Cactus?
Thank you in advance for your help!
The text was updated successfully, but these errors were encountered:
Apologies if my original question wasn't clear. So in my MAF file I have multiple aligned regions (genes in this case) from the same chromosome. So alignments of 'gene1' and of 'gene2' are going to have the same name if they come from the same chromosome. Chromosome name in my example above is "Mus_musculus.NC_000072.7". Since I have a BED file of target genes including their names in addition to the coordinates, I was wondering if ya'll had a tool or recommended way to distinguish alignments of different genes from the same chromosome. My goal is to eventually split the alignment (after converting to fasta) so I end up having one alignment file for each aligned gene. I haven't been able to accomplish this yet because the names of what's aligned are repetitive.
Hi ya'll!
I used Progressive Cactus to align whole genomes of 8 species. Then, using cactus-hal2maf with the --bedRanges option I was able to output an alignment including only genes of interest. When I inspect the MAF file however, the only names attached to sequences are the species.chromosome i.e., Mus_musculus.NC_000072.7. This has become problematic in downstream applications because some of the different genes are on the same chromosome and therefore the sequences have the same name. So even things like AlignIO from Biopython can't read the MAF file. Is there any way around this issue using the tools provided by Cactus?
Thank you in advance for your help!
The text was updated successfully, but these errors were encountered: