Analysis of Different Types of Communication

In this project, we will explore the fundamental differences between the language used in:

Speeches
Songs
Potentially more to come (poems, essays)

Important Packages Used

NLTK
pandas
scikit-learn
selenium
Beautiful Soup 4
TextBlob
re (part of Python's standard library, but still important)

Sample Selection

A number of speeches and albums worth of song lyrics will be analyzed. Current max is around 30 each due to the selection being scraped. Popularity itself is difficult to quantify, so I based the choice of albums based on most sold and speeches based on what was available.

Methodology for sample selection is very important. You want to ensure selection processes have as little bias as possible, but this is a for fun project, so I hope you'll excuse me for picking speeches based on what is on James Clear's website, which is listed in alphabetical order based on the orator's last name.

Please do not ignore selection methodology (as seen here) if doing a project where the results matter.

Data Collection

Web scraping will be conducted using both Selenium and Beautiful Soup (so I can learn about using both). Details on when one is a better alternative is detailed in the preprocessing notebooks, as well as some notes on how to be a good website patron when scraping. You don't want to slow/crash their site, get banned or worse.

Next Steps/In Progress

Comparison of LDA and NMF
Add more data (involves debugging the scraper since not all web pages have the same format)
Add visualizations of relevant data

Data Sources

Speeches transcripts from: https://jamesclear.com/great-speeches/

Album Selection based off of: https://www.businessinsider.com/50-best-selling-albums-all-time-2016-9

Songs lyrics from: https://www.azlyrics.com/

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
data_n=10		data_n=10
img		img
.gitignore		.gitignore
0_webscraping.ipynb		0_webscraping.ipynb
1_preprocessing.ipynb		1_preprocessing.ipynb
2_analysis.ipynb		2_analysis.ipynb
README.md		README.md
communication_data.xlsx		communication_data.xlsx
communication_data_bow.xlsx		communication_data_bow.xlsx
communication_data_clean.xlsx		communication_data_clean.xlsx
songs_raw.xlsx		songs_raw.xlsx
speeches_raw.xlsx		speeches_raw.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Analysis of Different Types of Communication

Important Packages Used

Sample Selection

Data Collection

Next Steps/In Progress

Data Sources

About

Uh oh!

Releases

Packages

Languages

deltalite/Communication-Style-Analysis-Using-NLP

Folders and files

Latest commit

History

Repository files navigation

Analysis of Different Types of Communication

Important Packages Used

Sample Selection

Data Collection

Next Steps/In Progress

Data Sources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages