From c8322f0eabdadc2c7ff1c4a7cc4e8aa012247186 Mon Sep 17 00:00:00 2001 From: charlottejmc <143802849+charlottejmc@users.noreply.github.com> Date: Wed, 22 Nov 2023 15:10:26 +0000 Subject: [PATCH] Update text-mining-with-extracted-features.md Reinsert fixed DOIs for HathiTrust datasets And turn DOI into a link --- en/lessons/text-mining-with-extracted-features.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/en/lessons/text-mining-with-extracted-features.md b/en/lessons/text-mining-with-extracted-features.md index d6c2f09cb..fe039a18a 100755 --- a/en/lessons/text-mining-with-extracted-features.md +++ b/en/lessons/text-mining-with-extracted-features.md @@ -58,7 +58,7 @@ This tutorial teaches the fundamentals of using the Extracted Features dataset w Though it is relatively new, the Extracted Features dataset is already seeing use by scholars, as seen on a [page collected by the HTRC](https://wiki.htrc.illinois.edu/display/COM/Extracted+Features+in+the+Wild). -[Underwood](https://doi.org/10.6084/m9.figshare.1279201) leveraged the features for identifying genres, such as fiction, poetry, and drama (2014). Associated with this work, he has released a dataset of 178k books classified by genre alongside genre-specific word counts (Underwood 2015). +[Underwood](https://doi.org/10.6084/m9.figshare.1279201) leveraged the features for identifying genres, such as fiction, poetry, and drama (2014). Associated with this work, he has released a dataset of 178k books classified by genre alongside genre-specific word counts ([Underwood 2015](https://doi.org/10.13012/J8JW8BSJ)). The Underwood subset of the Extracted Features dataset was used by Forster (2015) to [observe gender in literature](https://web.archive.org/web/20160105003327/http://cforster.com/2015/09/gender-in-hathitrust-dataset/), illustrating the decline of woman authors through the 19th century. @@ -1154,7 +1154,7 @@ Finally, the repository for the HTRC Feature Reader has [advanced tutorial noteb # References -Boris Capitanu, Ted Underwood, Peter Organisciak, Timothy Cole, Maria Janina Sarol, J. Stephen Downie (2016). The HathiTrust Research Center Extracted Feature Dataset (1.0) [Dataset]. *HathiTrust Research Center*. +Boris Capitanu, Ted Underwood, Peter Organisciak, Timothy Cole, Maria Janina Sarol, J. Stephen Downie (2016). The HathiTrust Research Center Extracted Feature Dataset (1.0) [Dataset]. *HathiTrust Research Center*. [https://doi.org/10.13012/J8X63JT3](https://doi.org/10.13012/J8X63JT3) Chris Forster. "A Walk Through the Metadata: Gender in the HathiTrust Dataset." Blog. [http://cforster.com/2015/09/gender-in-hathitrust-dataset/](https://web.archive.org/web/20160105003327/http://cforster.com/2015/09/gender-in-hathitrust-dataset/). @@ -1167,9 +1167,9 @@ Stéfan Sinclair & Geoffrey Rockwell (2016). "The Art of Literary Text Analysis. William J. Turkel and Adam Crymble (2012). "Counting Word Frequencies with Python". The Programming Historian. /lessons/counting-frequencies. Ted Underwood (2014): Understanding Genre in a Collection of a Million Volumes, Interim Report. figshare. -https://doi.org/10.6084/m9.figshare.1281251.v1. +[https://doi.org/10.6084/m9.figshare.1281251.v1](https://doi.org/10.6084/m9.figshare.1281251.v1) -Ted Underwood, Boris Capitanu, Peter Organisciak, Sayan Bhattacharyya, Loretta Auvil, Colleen Fallaw, J. Stephen Downie (2015). "Word Frequencies in English-Language Literature, 1700-1922" (0.2) [Dataset]. *HathiTrust Research Center*. +Ted Underwood, Boris Capitanu, Peter Organisciak, Sayan Bhattacharyya, Loretta Auvil, Colleen Fallaw, J. Stephen Downie (2015). "Word Frequencies in English-Language Literature, 1700-1922" (0.2) [Dataset]. *HathiTrust Research Center*. [https://doi.org/10.13012/J8JW8BSJ](https://doi.org/10.13012/J8JW8BSJ) Hadley Wickham (2011). "The split-apply-combine strategy for data analysis". *Journal of Statistical Software*, 40(1), 1-29.