Sentiment-Analysis-NYT-Articles

Sentiment analysis on NYT times immigration data with VADER and Textblob dictionaries

Poster Reference to project: https://scholar.valpo.edu/cus/917/

Tools used

R Studio
Jupyter Notebooks (python)

First Step: Extracting Data from NYT API

Folder: 1-NYT API Data Extraction

Created R notebook
Extract articles in a table like structure using https://www.storybench.org/working-with-the-new-york-times-api-in-r/
Extracted immigration articles using query of "migrant OR immigration OR immigrant OR migration OR refugee OR alien OR undocumented OR asylum" to get a wide range of immigration as a whole.
Raw data from 1981-2020 is present in Exctracting NYT API (allNYTSearch1981to2020). Articles were extracted from the last date of requests until the API had a max number off calls. This was manually done and it usually failed every 2 years of data per the query specified in the notebook.

Second Step: Preprocessing and Training

Folder: 2-Standarization and Training

Drop duplicate articles.
Data Standarization: Normalizing lead_paragraph with preprocess_regex function
Textblob and VADER assign a first score between -1 to 1 to identify the polarity of the first paragraph using the default dicctionaries.

Third Step: Identify Problematic words/ Recode words in the correct sphere from VADER dicctionary

Folder 3- Recoding words by hand

Identify problematic words and n-grams counter parts such as "united" which most of them belonged to United States
Recoded individual words in the correct sphere using inter-reliability measures to allocate words in the correct sphere

Fourth Step: Retrain individual Words, Filter USA/ Latino News and Time Series plots:

Folder 4- Retrain and Filter

allocate words identified in step 3 into the correct sphere by VADER since it was identified as the better dicctionary
Filter latino news using query specified in folder "latino query"
Filter news that were written in the United States
Normalize scores from -1-1 to 0-100 (in percent)
Computed the Yearly, Quartely and Monthly mean of normalized scores
Time series plots of normalized sentiment scores for All Articles, All Latino Articles, USA articles and USA Latino Articles

Fifth Step: Final Sentiment output

Folder 5- Final Sentiment output

Output retrained VADER scores for all the data (NYT_data_1980_to_2020_Retrained)
Output articles that are just in the United States (US_News_Articles)
Output articles that are Latino immigration news that appear in the United States (US_Latino_News_Articles)

Extra Analysis:

Extra Analysis for further research implementation in order to aggregate monthly, yearly and quarterly data

Web App Predicctions (Still in development)

Build a Machine learning model on retrained VADER data
Web application in order to predict new articles from retrained data and create a service that ingests new articles via the NYT time API

Name		Name	Last commit message	Last commit date
Latest commit History 170 Commits
.ipynb_checkpoints		.ipynb_checkpoints
1-NYT API Data Extraction		1-NYT API Data Extraction
2-Standarization and Training		2-Standarization and Training
3-Recoding Words by Hand		3-Recoding Words by Hand
4-Retrain and Filter		4-Retrain and Filter
5-Final Sentiment Output		5-Final Sentiment Output
Extra Analysis		Extra Analysis
Web App Razor Predictions		Web App Razor Predictions
WebAppPredictions-API/NYT.Sentiment.API		WebAppPredictions-API/NYT.Sentiment.API
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment-Analysis-NYT-Articles

Poster Reference to project: https://scholar.valpo.edu/cus/917/

Tools used

First Step: Extracting Data from NYT API

Second Step: Preprocessing and Training

Third Step: Identify Problematic words/ Recode words in the correct sphere from VADER dicctionary

Fourth Step: Retrain individual Words, Filter USA/ Latino News and Time Series plots:

Fifth Step: Final Sentiment output

Extra Analysis:

Web App Predicctions (Still in development)

About

Releases

Packages

Contributors 2

Languages

gcarvajal1222/Sentiment-Analysis-NYT-Immigration-Articles

Folders and files

Latest commit

History

Repository files navigation

Sentiment-Analysis-NYT-Articles

Poster Reference to project: https://scholar.valpo.edu/cus/917/

Tools used

First Step: Extracting Data from NYT API

Second Step: Preprocessing and Training

Third Step: Identify Problematic words/ Recode words in the correct sphere from VADER dicctionary

Fourth Step: Retrain individual Words, Filter USA/ Latino News and Time Series plots:

Fifth Step: Final Sentiment output

Extra Analysis:

Web App Predicctions (Still in development)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages