This project is created for the Unstructured Text Analysis course at the Central European University. Special thanks to Eduardo Ariño de la Rubia who is the professor of this course as I learned a lot from him. This course is based on the wonderful book: Text-Mining-R-Tidy-Approach.
The whole analysis is implemented in R and can be observed in this GitHub repository. The coding part can be separated into two main parts:
- Web scraping parts
- Unstructured text analysis part
Throughout the whole process, I tried to follow the main principles of clean code, therefore one with R knowledge should follow it easily. There are plenty of further development possibilities in this project, therefore anyone who is interested, feel free to contact me David Utassy for any contribution. I will not show the whole code in this document, but I will highlight the most important snippets.