Scripts and config file to make an sequence alignment from a bunch of GPCR protein sequences

See snooker_align.vsd for workflow.

The original alignment script was made for https://dx.doi.org/10.1186/1471-2105-12-332

Requirements

NCBI Blast
hmmer
Clustalo
Perl packages:
- Bio::SeqIO
- Text::LevenshteinXS

Gapped alignment based on gpcrdb human swissprot alignment

Steps to get an alignment

Create numbering schema
Download human swissprot alignment csv from gpcrdb website
Convert csv to with only positions of numbering schema
Create fasta from csv
Run blast with query seed alignment against swissprot/trembl/ensembl 5.1 Make sure all seed sequences have been found
Retrieve sequences of ids
Make sequences unique within same species
Remove species with less than 100 sequences
Run per tm alignment script
Remove sequences less than 9aa different within same species
Remove species with less than 100 sequences
Make tree of sequences
Generate entropy file based on tree

See runs.md for commands to perform the steps.