The project is to use the data set of 515K Hotel Reviews Data Under DataScience Challenge (Reveal data secrets) hosted by y DR.Doaa Mahmoud Abdel-at.
Do some data visualizations and perform useful Analysis like top hotels in a city or some other insights like popularity over time.
The dataset is obtained from Kaggle )..
This dataset contains 515,000 rows and scoring of 1493 luxury hotels across Europe. The geographical location of hotels are also provided. Following are the features included in data set: Hotel Name, Hotel Address, Hotel score, Review Date, Reviewer Nationality, Positive review, Negative review, reviewer score, tags, days since review.
it's a large group of guest reviews in Western Europe hotel, more than 500,000 reviews for 1493 luxury hotels
, as for the guests we have more than 200 nationalities, and about 50% of the reviews or data are for hotels in the United Kingdom,
and for each review in the data we have a positive review and negative review.
We will explore the data, clean it up, work on the analysis, and then choose the appropriate prediction model.
If you want to visit Spain, Austria, Italy, France, the Netherlands or the United Kingdom, whether in a leisure or business stay, and if you are alone or with friends, we will help you discover the best through more than 500 K reviews of hotels in Europe through the Booking.com.
-
Analyze the dataset with a focus on the reviewers’ nationality.
-
Find best and Worst hotel in Europe
-
Develop a recommendation system which gives a list of recommended hotels based on user’s hotel selection
-
Develop a recommendation system to show you best and worst reviews of hotel you selected
- Feature Engineering
- Business Question
- EDA
- Data Cleaning
- Feature Engineering
- Natural Language Processing (NLP) techniques
- Term frequency–inverse document frequency (TF-IDF) encoding
- Sentiment analysis
- Model Building and evaluation:
-
- we choose to build model to predict satisficatipn (satisfied of custumer or not) based on revierwer score first approach
- we choose another approch to build sentiment analysis model based on (positive and negative reviews) second approach
- Cosine Similarity
- Content Recommendation system
- Interactive StreamLit App
you can try them on kaggle: 1. Data analysis and EDA 2. Business questions and solutions 3. Modeling approach 1 4. Modeling approach 2
As part of the project, I have performed using StreamLit. we have learnt on the steps to take when performing an end-to-end project. This includes re-factoring of codes into functions so to easier compliation when compilin. In order to deploy a model, you will need to understand what you want to achieve and re-look at the code on how you could recode to achieve that.
You may access the deployed site here.
ElsayedALy |
Nourshosharah |