nyt-archive-search

User friendly web interface for creating datasets from the New York Times archive.

Archive Search

UPDATE Fall 2023: Due to changes with the New York Times API, this project is no longer able to provide the full text of each article in the search results. I am no longer hosting the application online.

Project Description

Archive Search is a web application that provides an simple, no-code method to search for and download datasets of New York Times articles on a specific topic. With Archive Search, students and researchers can quickly get started on computational text analysis projects. A custom dataset that includes the text of every article in the NYT digital archive that mentions the research topic provides a rich starting point for discovery.

The ultimate goal of the project is to make an important research method accessible to students and researchers without backgrounds in computer science. The application is pedagogical in nature- dead simple, approachable interfaces take precedence over fine grained control. Users should be able to move on to data analysis within minutes, rather than wasting hours or days setting up custom scripts to download and process data directly from the API that the New York Times provides.

Details

The application is deployed to a small Digital Ocean droplet. The Django app runs inside a Gunicorn WSGI. It uses the requests_html module to interact with the New York Times Article Search API (v2) and pandas to manage data collection. nyt_gatherer.py houses the primary data collection functionality.

Because the NYT API has a strict rate limit, users wishing to conduct large searches must provide their own API key. The Full Search mode returns a CSV file with metadata and text for each relevant article within the specified date range.

The Demo mode returns a similar file, but does not require the user to input their own API key.

Developers

To run the project locally:

Requires:

python 3.9
pip
Chrome or chromium

Clone into the repo: git clone https://github.com/zebbecker/nyt-archive-search.git
(Recommended) activate a virtual environment.
Change into inner project directory: cd nyt-archive-search/nyt_archive_search/
Install requirements: pip install -r requirements.txt
Set default API key: open the config.py file and replace the empty NYT_API_KEY value with your own API key.
Run development server: ./manage.py runserver
Run automated tests: ./manage.py test

Access the app through your browser on port 127.0.0.1:8000.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.github/workflows		.github/workflows
nyt_archive_search		nyt_archive_search
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nyt-archive-search

Archive Search

Project Description

Details

Developers

About

Releases

Packages

Languages

License

zebbecker/nyt-archive-search

Folders and files

Latest commit

History

Repository files navigation

nyt-archive-search

Archive Search

Project Description

Details

Developers

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages