Topic Modelling and Dictionary Based Analysis
- Folder "extract" is for preprocessing tweets or articles
- Folder "Topic":
- a text cleaner to remove punctuations, stop words or stem english words as needed/required
- Latent Dirichlet Allocation for topic modelling in both english and chinese text, available in python and R
- Folder "dictionary_analysis"
- Use dictionary based analysis to find out whether these articles/tweets are talking about certain subject, available in both partial and exact text matches