Skip to content

User friendly web interface for searching the New York Times archive

License

Notifications You must be signed in to change notification settings

zebbecker/nyt-archive-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 

Repository files navigation

nyt-archive-search

User friendly web interface for creating datasets from the New York Times archive.

Test status Python version: 3.9 Code style: black License: MIT

Archive Search

UPDATE Fall 2023: Due to changes with the New York Times API, this project is no longer able to provide the full text of each article in the search results. I am no longer hosting the application online.

Project Description

Archive Search is a web application that provides an simple, no-code method to search for and download datasets of New York Times articles on a specific topic. With Archive Search, students and researchers can quickly get started on computational text analysis projects. A custom dataset that includes the text of every article in the NYT digital archive that mentions the research topic provides a rich starting point for discovery.

The ultimate goal of the project is to make an important research method accessible to students and researchers without backgrounds in computer science. The application is pedagogical in nature- dead simple, approachable interfaces take precedence over fine grained control. Users should be able to move on to data analysis within minutes, rather than wasting hours or days setting up custom scripts to download and process data directly from the API that the New York Times provides.


Details

The application is deployed to a small Digital Ocean droplet. The Django app runs inside a Gunicorn WSGI. It uses the requests_html module to interact with the New York Times Article Search API (v2) and pandas to manage data collection. nyt_gatherer.py houses the primary data collection functionality.

Because the NYT API has a strict rate limit, users wishing to conduct large searches must provide their own API key. The Full Search mode returns a CSV file with metadata and text for each relevant article within the specified date range.

The Demo mode returns a similar file, but does not require the user to input their own API key.

Developers

To run the project locally:

Requires:

  • python 3.9
  • pip
  • Chrome or chromium
  1. Clone into the repo: git clone https://github.com/zebbecker/nyt-archive-search.git
  2. (Recommended) activate a virtual environment.
  3. Change into inner project directory: cd nyt-archive-search/nyt_archive_search/
  4. Install requirements: pip install -r requirements.txt
  5. Set default API key: open the config.py file and replace the empty NYT_API_KEY value with your own API key.
  6. Run development server: ./manage.py runserver
  7. Run automated tests: ./manage.py test

Access the app through your browser on port 127.0.0.1:8000.

About

User friendly web interface for searching the New York Times archive

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published