A simple script to modify a reference genome fasta file using a bed file.
- pybedtools
- numpy
- pandas
python3.8 modify-fasta.py -h
-h, --help "Show this help message and exit"
-bed <filename> "Input bed file"
-fi <filename> "Input fasta file to bedtools"
-o --output <filename> "The prefix of the output fasta file(s)"
-n <INT> "Number of output fasta files. Must be 1 or 2 [default=1].
If n=1 only one fasta file will be generated
* output.fa will be generated by using minor alleles (4th column of bed file)
If n=2 two fasta file will be generated
* output.1.fa will be generated by using major alleles (3th column of bed file)
* output.2.fa will be generated by using minor alleles (4th column of bed file)
python3.8 modify-fasta.py -bed in.bed -fi in.fasta
python3.8 modify-fasta.py -bed in.bed -fi in.fasta -out outname
python3.8 modify-fasta.py -bed in.bed -fi in.fasta -out outname -n 2
in.bed is a space-delimited text file that required four fields (without header)
- chr - The number/name of the chromosome
- position - The ending position of the SNP.
- allele1 - usually major allele
- allele2 - usually minor allele
example_in.bed
1 10505 A T
1 10506 C G
1 10511 G A
1 10539 C A
1 10542 C T
1 10579 C A
1 10642 G A
1 11008 C G
1 11012 C G
1 11063 T G