Skip to content

Various text analysis applied on science-related texts in the COCA corpus

Notifications You must be signed in to change notification settings

PolarBear77/Corpus-Clustering-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Corpus-Clustering-Project

Various text analysis applied on science-related texts in the COCA corpus

This project is aimed at analyzing science-related texts in the COCA corpus (Corpus of Contemporary American English). Currently, the magazine, academic, news sections in the corpus are used.
The data pipeline can be briefly described as follows:

  1. Preprocesisng
  • sort text ids that are related to science
  • done by sorting excel sheet containing text id numbers and specifications

About

Various text analysis applied on science-related texts in the COCA corpus

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published