This project focuses on exploring and predicting the prices of Airbnb listings using factors like location, property type, availability, and reviews. Through Exploratory Data Analysis (EDA) and machine learning techniques, the goal was to uncover insights about price determinants and create a predictive model to estimate listing prices.
- Handled missing values in key features like
reviews_per_month
andavailability_365
. - Addressed outliers in
price
and other numerical columns to improve data quality. - Converted categorical features, such as
neighborhood
androom_type
, into numerical representations for model compatibility.
Listings in popular neighborhoods tend to have higher average prices. Proximity to central locations and tourist spots significantly impacts pricing.
Prices exhibit seasonal trends, with peaks during holidays and vacation periods.
Properties with more reviews and higher ratings tend to be priced higher, indicating a correlation between customer trust and price.
- Standardized numerical features, such as
price
andminimum_nights
. - Encoded categorical variables for use in machine learning models.
-
Models tested:
- Linear Regression: A baseline model for interpretability.
- Random Forest Regressor: Captured non-linear relationships effectively.
-
Best Model:
The Random Forest Regressor achieved the best performance with the following metrics:- R² Score: 0.78
- Mean Absolute Error (MAE): $30.20
- Neighborhood and property type emerged as key predictors of listing price.
- The machine learning model provides a foundation for Airbnb hosts to estimate competitive prices.
- Programming Language: Python
- Libraries: Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn
- Data Quality Issues: Missing values and outliers required extensive cleaning.
- Feature Selection: Balancing model complexity with the interpretability of features.
- Expand the dataset to include listings from additional cities for broader insights.
- Explore advanced modeling techniques to improve prediction accuracy.
- Develop visual dashboards for better user interaction with the model results.