In today's challenge, we will be working with one of the most classic data sets in data science and machine learning: the Titanic. Today, we're going to use data to figure out who survived and who did not survive on the Titanic. Maybe the conclusions you come to here will help you survive on your next cruise.
To accomplish this, we'll first read in the CSV and clean the data. Then, we'll query the data set and come to some conclusioins based off of the data.
Read in the CSV file to titanic_1.py
using pandas.read_csv()
There are two things that we need to do to clean the data:
- Some rows contain missing values for age. Delete these rows entirely because they're incomplete rows.
- Separate the current
Name
column intoLastName
,FirstName
, andTitle
. You will notice some names contain nicknames or other parentheticals after them--be careful in your string slicing to make sure the rows are what you want them to be.
Now that you have clean data, let's do some Pandas queries:
- Get all rows
- Get all rows and display the values for
survived
andsex
- Get all rows where
sex
isfemale
- The mean of
survived
wheresex
ismale
- The mean of
survived
wheresex
isfemale
- The mean of
survived
whereage
is less than 45 - The mean of
survived
whereage
is greater than 70 - The mean of
survived
wheresex
ismale
andage
is less than 30 - The mean of
survived
wheresex
isfemale
andage
is less than 40 - The mean of
survived
wherePclass
is 1 - The mean of
survived
wherePclass
is 3 - The mean of
survived
wheresex
isfemale
andPclass
is 1 andage
s less than 35 - The standard deviation of
price
wherePclass
is 1 - The standard deviation of
price
wherePclass
is 3
What conclusions can you make based off the data you queried above as to who survived the Titanic?