Acquiring files from government agencies and other data sources on the Web is crucial to data journalism.
The Python language and ecosystem offer numerous libraries for working with remote resources. In this course, we'll focus on requests, a third-party library that is a go-to resource because of its ease of use and flexibility. requests can handle a variety of scenarios, from simple downloads of data files to working with authentication-based APIs such as Twitter.
requests
can be installed using standard package managers such as pip and pipenv. In this course, we generally use pipenv
to install libraries into "virtual environments" that allow us to isolate software dependencies for each project.
# system install
pip install requests
# Or install in a project environment using pipenv
cd ~/Desktop/code/awesome-project
pipenv install requests
In the below example, we use requests
to download the animals.csv sample data to a local file.
To follow along, fire up an interactive Python interpeter on the command line by typing python
or ipython
(if you've installed it).
>>> import requests
>>> url = "https://raw.githubusercontent.com/stanfordjournalism/stanford-progj-2020/master/data/animals.csv"
>>> response = requests.get(url)
>>> response.text
>>> print(response.text)
animal
cat
cougar
dog
snake
narwhal
Above, we use requests.get
to make a request for the remote file at a GitHub URL. The contents of the file are available in the .text
attribute of the response (from GitHub) to our web request.
From here, you can use standard methods for writing files or CSVs to save the content in a local file.
>>> with open('animals.csv', 'w') as local_file:
... local_file.write(response.text)
...
The above is a very basic use case. Check out the requests documentation for details on how to handle a variety of other scenarios.