Salaries data analysis using Pandas.
- Import pandas as pd
- Read Salaries.csv as a dataframe called salary
- Check the head of the DataFrame
- Use the .info() method to find out how many entries there are.
- What is the average BasePay?
- What is the highest amount of OvertimePay in the dataset ?
- Find the job title of any particular person e.g JGARY JIMENEZ ?
- How much does any particular person make e.g GARY JIMENEZ (including benefits)?
- What is the name of highest paid person (including benefits)?
- What is the name of lowest paid person (including benefits)?
- What was the average (mean) BasePay of all employees per year? e.g (2011-2014)
- How many unique job titles are there?
- What are the top 5 most common jobs?
- How many Job Titles were represented by only one person in 2013? (e.g. Job Titles with only one occurence in 2013?)
- How many people have the word Chief in their job title?
For this demonstration, I am using the Jupyter Notebook, open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.
Create a virtual enviroment and install dependencies by running requirements.txt.
$ pip install virtualenv env
$ source env/bin/activate
$ pip install -r requirements.txt
Run the script.
$ jupyter notebook