Skip to content

Latest commit

 

History

History
10 lines (6 loc) · 1.25 KB

File metadata and controls

10 lines (6 loc) · 1.25 KB

This project is created for the Unstructured Text Analysis course at the Central European University. Special thanks to Eduardo Ariño de la Rubia who is the professor of this course as I learned a lot from him. This course is based on the wonderful book: Text-Mining-R-Tidy-Approach.

Technical introduction

The whole analysis is implemented in R and can be observed in this GitHub repository. The coding part can be separated into two main parts:

  • Web scraping parts
  • Unstructured text analysis part

Throughout the whole process, I tried to follow the main principles of clean code, therefore one with R knowledge should follow it easily. There are plenty of further development possibilities in this project, therefore anyone who is interested, feel free to contact me David Utassy for any contribution. I will not show the whole code in this document, but I will highlight the most important snippets.