GitHub - taylor-yamashita/japan-minority-twitter-nlp

Deconstructing “Japanese-ness”: Analyzing Perceptions of Japan's Ethnic Minorities on Twitter Using Natural Language Processing

The code for my Princeton University senior thesis research, conducted under the supervision of Dr. Christiane Fellbaum and written in fulfillment of my degree in Computer Science and minor in Japanese Language and Culture.

Abstract: This study analyzes Japanese tweets via natural language processing techniques, in order to provide insight into current perceptions of ethnic minorities in Japan. While existing work has studied different topics on Japanese Twitter or researched online racism in other countries, most studies rely on manual labeling and purely qualitative analysis, and none have studied online racial discrimination in a Japanese context. Drawing on the existing sociological literature, this research fills this gap by introduc- ing a rigorous quantitative component to the study of Japan’s minorities, applying Word2Vec semantic vector similarity analysis and Latent Dirichlet Allocation (LDA) analysis to sixteen total tweet datasets. By studying changes over time in common topics and keywords associated with different minority groups, this work compares and contrasts discourse pertaining to Zainichi Koreans, native Ainu and Okinawan populations, Hāfu, and Southeast Asian migrants, in order to develop a broadened understanding of racial discrimination and attitudes toward minorities in Japan to- day. The data suggest that the conversation surrounding a minority group generally reflects a level of awareness of social issues that corresponds to the group’s history of advocacy efforts and established presence in Japan.

Guide

thesis_twscrape.ipynb: code for scraping Tweets
thesis_preprocess.ipynb: code for preprocessing Tweets
thesis_w2v.ipynb: code for running Word2Vec analysis
thesis_lda.ipynb: code for running LDA analysis
thesis_lda_run.ipynb: runs code in thesis_lda_run.ipynb on all minority keyword datasets

Requirements

Install requirements using the pipenv Pipfile.

Pretrained Word2Vec Models

To load model trained on general tweet dataset: w2v = gensim.models.Word2Vec.load("saved_w2v_models/thesis_w2v_[YEAR]")

Available years: 2015, 2022

Pretrained LDA (Latent Dirichlet Allocation) Models

To load model trained on minority keyword tweet dataset: lda = gensim.models.LdaModel.load("saved_lda_models/lda_model_[KEYWORD]_[YEAR]")

Available keywords: "zainichi" (在日コリアン), "ainu" (アイヌ), "okinawa" (沖縄人), "ryukyujin" (琉球人), "haafu" (ハーフ), "vietnamjin" (ベトナム人), "philippinejin" (フィリピン人)
Available years: 2015, 2022

Name		Name	Last commit message	Last commit date
Latest commit History 186 Commits
saved_lda_models		saved_lda_models
saved_tokens		saved_tokens
saved_tweets		saved_tweets
saved_w2v_models		saved_w2v_models
stopwords		stopwords
NotoSansJP-VariableFont_wght.ttf		NotoSansJP-VariableFont_wght.ttf
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
thesis_lda.ipynb		thesis_lda.ipynb
thesis_lda_run.ipynb		thesis_lda_run.ipynb
thesis_preprocess.ipynb		thesis_preprocess.ipynb
thesis_twscrape.ipynb		thesis_twscrape.ipynb
thesis_w2v.ipynb		thesis_w2v.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deconstructing “Japanese-ness”: Analyzing Perceptions of Japan's Ethnic Minorities on Twitter Using Natural Language Processing

Guide

Requirements

Pretrained Word2Vec Models

Pretrained LDA (Latent Dirichlet Allocation) Models

About

Releases

Packages

Languages

taylor-yamashita/japan-minority-twitter-nlp

Folders and files

Latest commit

History

Repository files navigation

Deconstructing “Japanese-ness”: Analyzing Perceptions of Japan's Ethnic Minorities on Twitter Using Natural Language Processing

Guide

Requirements

Pretrained Word2Vec Models

Pretrained LDA (Latent Dirichlet Allocation) Models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages