Data analysis repository for Udacity Data Scientist Nanodegree blogpost project
Additional library requirements can be found in 'requirements.txt'. Python 3.* should allow the code to run without issues
For this project I was interested in international health statistics to better understand:
- What factors most affect life expectancy?
- What countries of the world have the highest and lowest life expectancy? Why?
- Can life expectancy be predicted based on the data available here?
.ipynb files: Features all actual working code vis directory: Contains .json files of plotly visualisations generated in jupyter notebooks Life_Expectancy_Data.csv obtained from Kaggle (see acknowledgements) Life_Expectancy_Data_Cleaned.csv generated by jupyter notebooks, is the cleaned version of Life_Expectancy_Data.csv requirements.txt: Python library requirements to run the notebooks within this repository life_expectancy_data_profiling_report.html: Generated by pandas_profiling library. Summarises columns in dataset
The results of this analysis can be found in a blog post here: https://medium.com/@jacob.punter/life-expectancy-how-does-a-countries-development-factors-affect-it-ef397ced579a
The Life_Expectancy_data csv dataset has been obtained from Kaggle: https://www.kaggle.com/kumarajarshi/life-expectancy-who Credit for this dataset is given to the World Health Organisation who collected the data. All code within this repository has been solely developed for this project and is available for public use. Functions developed with help from other sites are given credit within the function documentation.