The-Big-Bang-Theory-Scripts-Assignment

Overview

This project involves text mining on scripts from "The Big Bang Theory" across the first 10 seasons, focusing on the character Howard Wolowitz. Participants will preprocess dialogues, analyze sentence/word counts, noun/person name mentions, important words per episode/season, and word co-occurrence.

Dataset

The dataset comprises text scripts with attributes such as episode name, dialogue, and person scene, stored in a CSV file. Participants can use Python and pandas to import the dataset.

Instructions

Preprocessing Steps:
- Remove punctuations, convert to lowercase.
- Perform sentence and word tokenization.
- Remove stopwords using NLTK library.
- Apply Porter stemmer for stemming.
- Perform POS tagging and NER using the Spacy library.
Analysis Questions (Howard Wolowitz):
- a. Average sentences and words per episode, exploring season deviations.
- b. Global mentions of nouns and person names.
- c. Identify important words using TF-IDF and bag-of-words, visualizing with Wordcloud.
- d. Examine word co-occurrence using Positive Pointwise Mutual Information.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Project1_Group5.ipynb		Project1_Group5.ipynb
Project_1.pdf		Project_1.pdf
README.md		README.md
scripts.csv		scripts.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The-Big-Bang-Theory-Scripts-Assignment

Overview

Dataset

Instructions

About

Releases

Packages

Languages

blahblahradio/The-Big-Bang-Theory-Scripts-Assignment

Folders and files

Latest commit

History

Repository files navigation

The-Big-Bang-Theory-Scripts-Assignment

Overview

Dataset

Instructions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages