You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have several thousand fasta files, each fasta file represents a single gene containing all sequenced individuals from a single population. Each fasta file is a nucleotide alignment, which I have attempted to framecode align using TranslatorX. I would like to calculate piN/piS for each gene using snpgenie_within_group.pl
I do not have a gtf file. Can I run snpgenie_within_group.pl without the gtf file? If not, can you offer any guidance on how would I format this type of sequence data and generate the correctly formatted gtf file?
Thanks
The text was updated successfully, but these errors were encountered:
Thanks a lot for the question @annasimonsen! Unfortunately I wrote the script with chromosomes (not genes) in mind and did not have enough foresight to allow flexible usage without the GTF. However, I think your pipeline could auto-generate temporary GTF files one the fly. For example, suppose your directory contained three files: gene1.fasta, gene2.fasta, and gene3.fasta. You'll probably be looping through these files somehow, perhaps using a wrapper Unix script. When you hit gene1.fasta, you can determine the length of the sequences inside — I think bioawk has something ready made, or you could get clever with cat, grep, and awk — and then write a file called gene1.gtf. For example, if gene1.fasta is an alignment of sequences with 693 nucleotides, you'd simply write the 1-line GTF file:
Then provide that temp file as an argument to SNPGenie, and delete when finished (or whatever you prefer). In other words, ever temp GTF file you produce will be a single line that species one gene beginning at 1 (in every case) and ending at the last site (i.e. length).
I have several thousand fasta files, each fasta file represents a single gene containing all sequenced individuals from a single population. Each fasta file is a nucleotide alignment, which I have attempted to framecode align using TranslatorX. I would like to calculate piN/piS for each gene using snpgenie_within_group.pl
I do not have a gtf file. Can I run snpgenie_within_group.pl without the gtf file? If not, can you offer any guidance on how would I format this type of sequence data and generate the correctly formatted gtf file?
Thanks
The text was updated successfully, but these errors were encountered: