Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAF sequence names #1472

Open
austinchipps opened this issue Aug 30, 2024 · 2 comments
Open

MAF sequence names #1472

austinchipps opened this issue Aug 30, 2024 · 2 comments

Comments

@austinchipps
Copy link

Hi ya'll!

I used Progressive Cactus to align whole genomes of 8 species. Then, using cactus-hal2maf with the --bedRanges option I was able to output an alignment including only genes of interest. When I inspect the MAF file however, the only names attached to sequences are the species.chromosome i.e., Mus_musculus.NC_000072.7. This has become problematic in downstream applications because some of the different genes are on the same chromosome and therefore the sequences have the same name. So even things like AlignIO from Biopython can't read the MAF file. Is there any way around this issue using the tools provided by Cactus?

Thank you in advance for your help!

@glennhickey
Copy link
Collaborator

the only names attached to sequences are the species.chromosome i.e., Mus_musculus.NC_000072.7

What more do you want?

@austinchipps
Copy link
Author

Hey Glenn,

Apologies if my original question wasn't clear. So in my MAF file I have multiple aligned regions (genes in this case) from the same chromosome. So alignments of 'gene1' and of 'gene2' are going to have the same name if they come from the same chromosome. Chromosome name in my example above is "Mus_musculus.NC_000072.7". Since I have a BED file of target genes including their names in addition to the coordinates, I was wondering if ya'll had a tool or recommended way to distinguish alignments of different genes from the same chromosome. My goal is to eventually split the alignment (after converting to fasta) so I end up having one alignment file for each aligned gene. I haven't been able to accomplish this yet because the names of what's aligned are repetitive.

Austin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants