Skip to content

ianlaiky/SC1015-Team-7-Movie-Popularity

Repository files navigation

Welcome to Team 7 repository

Content page

  1. About 🙋🏻‍♂️
  2. Contributors 🧙🏻‍♀️🧙🏻
  3. Problem Definition🕵🏻
  4. Models Used 🙆🏻‍♂️
  5. Conclusion 👨🏻‍💻
  6. Project Takeaway 👨🏻‍🎓
  7. References

About 🙋🏻‍♂️

image COVID-19 has cause significant impact to our society for the past 2-3 years, this is especially true for the film industry. Cinemas were closed to prevent the spread of the virus and today requires patrons to leave gaps between each other thus reducing the potential earnings for a particular screening.

This is a Mini-Project for SC1015 (Introduction to Data Science and Artificial Intelligence) focuses how movie producers can maximise their profit. image

Contributors 🧙🏻‍♀️🧙🏻

  • @ianlaiky - Ian Lai Kheng Yan - Data Extraction,Machine Learning
  • @drainboy - Koh Jia Sheng Eldrian - Exploratory data analysis
  • @kavi-99 - Kavita Sriram - Exploratory data analysis

Problem Definition

  • How do different variables affect the revenue of a movie?

Models Used 🙆🏻‍♂️

Random Forest

drawing

Multivariate Linear Regression

drawing

XGBoost

drawing

Conclusion 👨🏻‍💻

Data-driven insights and recommendations:

  • July as a release month brings in high revenue.

  • Including elements of action, adventure, history, fantasy in the story of the movie will be beneficial as those genres have shown to generate high revenues.

  • Keeping the runtime within a 50-150 minutes range, is optimal in ensuring high revenues.

  • Popularity has a moderately strong correlation with revenue, therefore, movie producers should make an effort to market their movie well so that its popularity is high.

  • Including actors with a consistently successful work performance in the movie may enhance the chances of generating a higher revenue.

Project Outcomes and Conclusions:

  • Predict revenue of a movie based on factors during the making of the movie.

  • Random forest and Multivariate linear regression did give most similar accuracy, however linear regression accuracy was most similar in train and test.

  • The moderately accurate score our models gave may be attributed to movies not releasing their budget and revenue information, thus our original 140k rows of data has been reduced to 2747 rows. The smaller dataset may have influenced the accuracy of our models.

Project Takeaway 👨🏻‍🎓

What did we learn?

New Models

Exploring other types of models such as

  • Random Forest Regression
  • XGBoost

Collaboration

  • Collaborating using Google Colab

Data Extraction

  • Using API to extract data from TMDB website. Writing python script that extracts data from TMDB website and saves it to a csv file.
  • Handling occasional crashes due to unstable internet, and resuming the scraping process through error checking.
  • Handling random erroneous string that causes data in csv to be 'corrupted'
  • Using tmdbv3api to extract data from TMDB website. (https://github.com/AnthonyBloomer/tmdbv3api)

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published