This project explores how the sentiment of movie reviews impacts the success of future films for directors and actors. Using Natural Language Processing (NLP), sentiment analysis, and various statistical techniques, we analyze the relationship between review sentiments and box office performance. This research provides insights into how reviews can drive success in the entertainment industry, helping organizations understand the role of public perception in future projects.
How do mentions of actors and directors in movie reviews, along with the sentiment expressed, influence their future projects' revenue and success metrics?
- Programming Languages: Python
- Libraries: Pandas, Scikit-learn, NLTK (NLP), Matplotlib, Seaborn
- Statistical Models: Mann-Whitney U Test, Spearman’s Rank Correlation, Generalized Linear Models (GLM), Mixed-Effects Models, Cohen's d Effect Size
- Data Sources: Publicly available movie-related datasets from Kaggle
- Visualization: Scatter plots, bar charts, heatmaps, and time-series graphs
-
Data Collection & Preparation:
- Data was sourced from multiple publicly available datasets on Kaggle, focusing on movie reviews, revenue, and other success metrics.
- Data cleaning and merging were performed using Python, with extensive preprocessing steps to address missing or inconsistent data.
-
Sentiment Analysis:
- Reviews were tokenized, and sentiment scores were assigned using NLP libraries. These scores were aggregated for both actors and directors across multiple projects.
-
Statistical Testing:
- The relationship between sentiment scores and revenue was evaluated using statistical tests, including Mann-Whitney U Tests and Spearman’s Rank Correlation.
- Generalized Linear Models (GLMs) were applied to assess how much of the variance in revenue can be explained by sentiment and review volume.
- Mixed-Effects Models were used to account for random effects between actors and directors.
- Positive Sentiment Boosts Revenue: Positive reviews were found to have a statistically significant impact on revenue for both actors and directors, with large effect sizes.
- Neutral Sentiment Matters: Neutral sentiment had a borderline but notable impact, particularly for actors.
- Review Count is Critical: The number of reviews was consistently a significant predictor of future success.
- The project involved complex data cleaning and merging processes, which were more challenging than anticipated. Iterative approaches were used to ensure data quality.
- Low R-squared values in GLM models suggest that other factors beyond review sentiment may also play a role in determining revenue, which could be a focus for future research.
Key visualizations include scatter plots showing the correlation between sentiment and revenue, as well as bar charts demonstrating effect sizes for actors and directors. These can be viewed in the notebooks or visualizations section of the repository.
- Clone the repository.
- Install dependencies:
pip install -r requirements.txt
- Open the Jupyter notebook to run the analysis.
The analysis shows a strong relationship between positive sentiment in movie reviews and box office success, highlighting the importance of public perception in the entertainment industry. While reviews explain some of the variance in revenue, future research should investigate additional factors that contribute to a film's financial success.
The project follows the CRISP-DM methodology and is planned over 6 weeks, with milestones ranging from business understanding to deployment.
All sources used will be properly cited and referenced, adhering to ethical guidelines for academic and professional work.
- Movie Reviews Dataset
- The Complete Movie Dataset
- Clapper Massive Rotten Tomatoes Movies and Reviews
- 25k Movie Dataset
- Millions of Movies
- Worldwide Box Office Rankings 1977
- Weekend Box Office Summaries
- Basuroy, S., & Ravid, S. A. (2006). The Power of Stars: Do Star Actors Drive the Success of Movies? Journal of Marketing, 70(4), 1-20.
- Elberse, A. (2013). Blockbusters: Hit-making, Risk-taking, and the Big Business of Entertainment. Henry Holt and Co.
- Epstein, E. J. (2012). The Hollywood Economist 2.0: The Hidden Financial Reality Behind the Movies. Melville House.