Corpus-Clustering-Project

Various text analysis applied on science-related texts in the COCA corpus

This project is aimed at analyzing science-related texts in the COCA corpus (Corpus of Contemporary American English). Currently, the magazine, academic, news sections in the corpus are used.
The data pipeline can be briefly described as follows: