Finding an Airbnb is difficult, especially searching for one that's just right. Guests looking to stay in Dublin seem to share the same sentiment! In an effort to improve the guest experience on Airbnb, we analysed Strata's 2 datasets of Airbnb guests' searches in Dublin and their following inquiries to hosts.
Our project consists of three main components: data visualization and analysis, machine learning, and a chatbot. In this Datathon, we used data analysis as a tool to the other parts of our project. As soon as we found which pieces of data were significant to whether a AirBnB user would have their booking accepted, we proceeded to create our machine learning model. This model takes input variables from the dataset provided to us by Strata, but we didn't stop there. We continued on to make a chatbot that is capable of providing recommendations to users based on the data we analyzed.
- We were initially given two separate datasets
- Each user had multiple entries, making it difficult to merge the datasets
- Combined into one set by user ID
- Each user had multiple entries, making it difficult to merge the datasets
- Outliers
- Remove outliers with a modified version of z-scores
We used R's built-in statistical functions and p-value tests to run some preliminary analysis on the composite data file. This allowed us to discover which variables influence acceptance rate for Airbnb's. Using this information, we were able to generate informative plots using Seaborn and train our machine learning model.
Random Forest Classifier Model
- Compiled dataset
- Converted qualitative data to numerical using a scikit-learn LabelEncoder
- Output: Will a guest be accepted by a host? (1 for true, 0 for false)
- Inputs: Guest message time, host message time, check-in time, origin country
- Origin country had the biggest impact (accuracy jumped 78% - 98%)
Overall accuracy: 98.18% // Training time: 41.61ms
Meet Bobby, an advanced AI tool developed with OpenAI's latest technology in assistants, designed to analyze and visualize data trends directly from datasets.
- Deep Data Analysis: Excels in extracting meaningful insights and patterns.
- Actionable Visualizations: Converts raw data into clear, actionable visual reports for strategic decision-making.
Preprocessing
- Finding a way to merge the datasets was difficult
- Users had multiple entries in both datasets, so rows were merged
- Numerical entries were averaged
- Categorical entries were appended to sets
- Preparing the data for the machine learning model was also a challenge
- All categorical data had to be converted to numerical representations
- Users had multiple entries in both datasets, so rows were merged
- Throughout the course of this Datathon, we've developed several skills:
- We now have a deeper understanding of machine learning techniques
- We understand how data analysis can be used to highlight problems and inspire solutions
- Most importantly, we learned to delegate work and collaborate effectively in a team setting
- We plan to use an expanded range of data
- The data we used was from 2014 (10 years ago!) so more recent data would be more relevant
- We will also further refine the ML model
- Potentially using a Neural Network