This project was completed as part of Udacity's Data Science Nanodegree. The aim of the project was to explore a particular data set, to identify three business questions and to use code to answer them. The three questions identified were:
- How does listing availability and price correlate during the year?
- Which neighbourhood's in Seattle has the priciest listings, and how does this relate to the area's average listing rating?
- How can we predict the price of specific listings?
The Anaconda environment was used to work in the Jupyter IDE using Python 3.6. The packages used in the completion of the project include:
- pandas
- matplotlib.pyplot
- seaborn
- numpy
- datetime
- sklearn.linear_model
- sklearn.metrics
The Seattle Airbnb Data Science Project [Udacity].ipynb file contains the Jupyter notebook used to execute the project and includes code with the aim of answering the three business questions above. The data sets used are calendar.csv, listings.csv, and reviews.csv. These data sets include general attributes about each listing.The sets are publicly available on Kaggle.
The results of the three business questions are discussed in this medium article by me.
Thanks to Udacity for creating such an awesome Data Science programme and thanks to Kaggle fro making the data sets available.