This is done for each NAM, replace the NAM with the actual NAM name and run it on each NAM scaffolds (except B73 and B73Ab10) The map files for each NAM should be placed in their own folder
Mo18W example run:
build index for your scaffolds:
00_build_index_hisat2.sh Mo18W.scaffolds.fasta
for f in *.map; do
gg1_ProcessMaizeGDBmaps.sh $f M018W;
done
stdout:
The example output for Mo18W
For M018W Chr 1, orignal file had 113 markers, but only 103 had information and 103 sequences were able to extract
For M018W Chr 2, orignal file had 75 markers, but only 65 had information and 65 sequences were able to extract
For M018W Chr 3, orignal file had 85 markers, but only 72 had information and 72 sequences were able to extract
For M018W Chr 4, orignal file had 76 markers, but only 64 had information and 64 sequences were able to extract
Feature (5:229199657-229199757) beyond the length of 5 size (217959525 bp). Skipping.
For M018W Chr 5, orignal file had 88 markers, but only 79 had information and 78 sequences were able to extract
For M018W Chr 6, orignal file had 49 markers, but only 39 had information and 39 sequences were able to extract
For M018W Chr 7, orignal file had 46 markers, but only 39 had information and 39 sequences were able to extract
For M018W Chr 8, orignal file had 56 markers, but only 43 had information and 43 sequences were able to extract
For M018W Chr 9, orignal file had 57 markers, but only 46 had information and 46 sequences were able to extract
For M018W Chr 10, orignal file had 35 markers, but only 31 had information and 31 sequences were able to extract
You will see .fasta
and .tsv
for each chr separately
gg2_MergeMarkers.sh M018W
No stdout, but you will see:
M018W_merged.fasta
M018W_merged.txt
as output
Map markers
gg3_MapMarkers.sh \
/work/LAS/mhufford-lab/arnstrm/Canu_1.8/genetic_maps/Mo18W/Mo18W_index/Mo18W \
M018W_merged.fasta \
M018W_merged.txt M018W
No stdout, but few files are generated by this script.
M018W_mapped.sam
mapping_stats.txt
M018W_mapped.bam
M018W_sorted.bam
M018W_sorted-q30.bam
M018W_sorted_noMismatch.bam
M018W_part1.txt
M018W_part2.txt
M018W_GG-mapped.csv
The file M018W_GG-mapped.csv
is needed for the ALLMAPS step later, but we will renamed this file:
mv M018W_GG-mapped.csv goldengate.csv
First we need to prepare PG marker file (once for all NAM lines). File is located in CyVerse Data Commons that is accessible via iRods
icd /iplant/home/shared/panzea/genotypes/GBS/v27
iget Lu_2015_NatCommun_panGenomeAnchors20150219.txt.gz
or through direct link:
https://datacommons.cyverse.org/browse/iplant/home/shared/panzea/genotypes/GBS/v27/Lu_2015_NatCommun_panGenomeAnchors20150219.txt.gz
# wget/curl will not work
Once downloaded process the file downloaded as follows:
for f in {1..10}; do
awk -v x=$f '$3==x && $7==0 {print $5"\tpg_"NR"\t"$8}' Lu_2015_NatCommun_panGenomeAnchors20150219.txt;
done > pb_anchors.txt
# bed file to extract sequence
for f in {1..10}; do
awk -v x=$f '$3==x && $7==0 {print $5"\t"$6-50"\t"$6+50"\tpg_"NR"\t.\t+"}' Lu_2015_NatCommun_panGenomeAnchors20150219.txt;
done > pb_anchors.bed
# genome
wget ftp://ftp.ensemblgenomes.org/pub/plants/release-22/fasta/zea_mays/dna/Zea_mays.AGPv3.22.dna.genome.fa.gz
gunzip Zea_mays.AGPv3.22.dna.genome.fa.gz
# marker sequence
ml bedtools2
awk '$2>0' pb_anchors.bed > temp
mv temp pb_anchors.bed
bedtools getfasta -fi Zea_mays.AGPv3.22.dna.genome.fa -fo pb_anchors.fasta -bed pb_anchors.bed -name
Mo18W example run:
gg3_MapMarkers.sh \
/work/LAS/mhufford-lab/arnstrm/Canu_1.8/genetic_maps/Mo18W/Mo18W_index/Mo18W \
pb_anchors.fasta \
pb_anchors.txt \
M018W
same set of files are created as before. But you will only need M018W_GG-mapped.csv
. Rename this file
M018W_GG-mapped.csv M018W_pangenome.csv
gg4_clean-markers.sh M018W_pangenome.csv
this will create pangeome.csv
file.
Only if you find heterozygous scaffolds in your genome (based on optical map), you can construct a file with just the names of scaffolds that are heterozygous and remove them form the marker file. This will prevent them from including those scaffolds in the AGP.
remove-hets.sh hets-to-remove.txt
Run ALLMAPS
singularity exec --bind $PWD /work/LAS/mhufford-lab/arnstrm/Canu_1.8/genetic_maps/jcvi.simg \
/work/LAS/mhufford-lab/arnstrm/Canu_1.8/genetic_maps/98_runALLMAPS.sh \
pangenome.csv \
goldengate.csv \
Mo18W.scaffolds.fasta