This folder contains the files and scripts that has been used in correlation analysis between
- pi - number of mapped traits.
- AF of all the populations/kb - number of mapped triats.
pi_m_trait_corr_file.tsv
: the source file that used in correlation analysis between number of reported traits and pi values
pi_m_triats.py
: the scripts that used in generating pi_m_trait_corr_file.tsv
gene.pi.AF.4fd.txt
: pi values file from collaborator. I copied to here just for easy use.
no_m_traits_af_final_single_file.tsv
: file used in correlation analysis between number of mapped traits and AF/Kb
mart_export.txt
: exported file contains GENE, TRANSCRIPTS info based on GRCh37 assmebly
- do correlation analysis no of reported traits vs gene length by adding gene length to the correlation file
- could include protein data, if a transcript has a protein product
- recheck the data, make a clear description of the data trait/AF correlation file:
- if non-coding DNA was included
- if I have used the longest transcript of each gene
- how many genes? and how many transcripts
- How to get the data? what was contained in the data
- start with the neurodegenerative diseases, then do the correlation again(see email from boss)
- it also worth to check the methods that used in GWAS-Catalog studies(e.g. Affimatrix?)
- direct plot the data rather than use histogram (see online onenote report)
- age-related diseases should include: Stroke, Alzheimer and Parkinson disease, type II diabetes, metabolic syndrome, obesity, cardiovascular disease, hypertension, age-related macular degeneration
- cancer should include: prostate cancer, colorectal cancer, ovarian cancer, pancreatic and breast cancer
======================
- email 1000 genome project enquiring how to do the mapping(replied, see archived email)
- check if pi values were from exomes
- should have analysis for exomes, for the accurate
- characterize fully the data that I have derived
- correlations between genetic diversity and molecular evolution rate(dn/ds)