Skip to content

Reference corpora for authorship attribution studies

Notifications You must be signed in to change notification settings

cophi-wue/refcor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

refcor

Reference corpora for authorship attribution studies.

This repository contains three collection of novels developed for stylometric authorship attribution studies. Each collection contains seventy-five novels from twenty-five different authors, each author contributing three texts, respectively.

German English French
Source of the texts TextGrid Gutenberg Ebooks libres et gratuits
Range of original publication dates 1774–1926 1838–1921 1827–1934
Total number of tokens 10,354,989 11,771,901 7,401,126
Length of shortest novel (tokens) 19,820 40,720 33,501
Length of longest novel (tokens) 761,821 456,637 209,992
Mean length of novels (tokens) 138,067 156,958 98,681
Standard deviation of novel length 134,857 85,890 42,194

About

Reference corpora for authorship attribution studies

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published