Presidential Documents Ngram Viewer

Purpose

To discover word usage frequencies by US presidents over time across 118,561 official documents and transcripts. Answers questions such as:

What does word usage by US presidents look like over time?
Who said it first? Last? Most?
Where was it said?
When was it said?
Can I preview the documents it was said in?

Data

The primary data source was The American Presidency Project at https://www.presidency.ucsb.edu/. I wrote a web scraper in R to capture all 118,561 official documents and associated metadata including:

Date
Location
Categories
President
Citation
Document uri
Word count

Corpus data totaled 117,374,146 words, which was then tokenized using Quanteda to produce a SQLite database of 7,069,561 n-grams. This project gathered n-grams 1:5, meaning single words up to 5 word pairs.

The corpus data was further optimized for full text search by leveraging Sqlite’s FST4 extension. With FST4, it was also possible to extract snippets from the corpus data (pictured below).

Demo

Live version available at: https://bryanfinlayson.shinyapps.io/presidential_ngram_search/

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
ngram_app		ngram_app
scraper		scraper
tokenizer		tokenizer
.gitignore		.gitignore
Readme.md		Readme.md
presidential_speeches.Rproj		presidential_speeches.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Presidential Documents Ngram Viewer

Purpose

Data

Demo

Screenshots

About

Releases

Packages

Languages

bdfinlayson/presidency_ngram_viewer

Folders and files

Latest commit

History

Repository files navigation

Presidential Documents Ngram Viewer

Purpose

Data

Demo

Screenshots

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages