Skip to content
Rishi De-Kayne edited this page Jun 13, 2022 · 7 revisions

Welcome to the Alpine_whitefish_WGS wiki - this document will outline all the scripts used for De-Kayne et al. XXXX

Two main scripts contain the bash/cluster commands for the analyses:

genotyping_commands_99_analysis.txt contains scripts to get from the raw fastq files to the key VCF (99indiv_15mil_SNPs_output_filt_mac3_miss1_qual30_mindepth3_maxdepth50.vcf.gz) used for the rest of the analyses

all_commands_99_analysis.txt contains information on all subsequent analyses (with corresponding scripts for each analysis in R below)

The key background file for individuals that all R scripts and some bash scripts use for background information:

background_2021_99.csv has all this information including individual number, species, lake, gill raker count, sex, sequencing depth, and SRA accession (to be filled upon publication)

Corresponding analysis-specific R scripts include:

These analyses include:

  1. PCAs - all_commands_99_analysis_01.R
  • 1.1 Full 99 dataset (without outgroups)
  • 1.2 Individual lakes
  1. RAxML
  2. Admixture - all_commands_99_analysis_03.R
  3. Dsuites - dataset-wide f-branch stats - Dsuite_tree_parser.R and parse_z_scores.R
  4. GWAS - all_commands_99_analysis_05.R
  • 5.1 Gill raker count
  • 5.2 Sex
  1. parallel study
  • 6.1 PCA - all_commands_99_analysis_rev_response_06.R
  • 6.2 Fst - all_commands_99_analysis_rev_response_06.R
  • 6.3 CSS - all_commands_99_analysis_rev_response_06.R
  • 6.4 outlier analysis - all_commands_99_analysis_rev_response_06_4.R and CS_outliers_response_1659.R
  1. f4 statistics - all_commands_99_analysis_07.R
  2. KEGG differences for fst outlier overlaps all_commands_99_analysis_09.R

Tests for gene length relating to enrichment of GO outliers (CSS and FST) can be found here: GO_enrichment_length.R

Tests of phenotypic variation can be found here: Phenotypes.R

Details on ENA upload of raw data can be found in: ena_upload.txt and fastq2_filled_RDK.tsv