To quote from the Guardian:
The Counted is a project by the Guardian --- and you --- working to count the number of people killed by police and other law enforcement agencies in the United States throughout 2015 and 2016, to monitor their demographics and to tell the stories of how they died.
The Guardian makes the data behind the project available as a ZIP file and they keep that file up-to-date, but they don't give any indication when that file changes nor what's changed when it does.
Fortunately, the ZIP file's contents are a README and two CSV files (data for 2015 and data for 2016), which are well-suited to being stored in Git, a source control system. And since the zipped data is available on the Web it's also easy to check that regularly to see if it's changed.
That's where this comes in: the ZIP file is checked every twenty minutes for changes, and if there's anything new it's committed to this repository on GitHub. By keeping track of this repo you can ensure you have the latest version of the data behind The Counted.
Data extracted from the source ZIP file is kept in the data
directory on the master branch. No alterations are made to the files themselves and all the hard work is done by the Guardian's staff.
Everything outside of the data
directory is not part of the source data and is only there to support keeping it in this repo.
Every twenty minutes a Python script is run using Cron. The script checks to see if the data has been updated, and commits any files that have changed.
The script is kept within this repository as scripts/update_repo.py
. To run it you need:
- Python 3
requests
github3.py
python-pushover
The requirements are available in requirements.txt
and can be installed with pip. To receive Pushover notifications you'll need a config in ~/.pushoverrc
, but it will fail silently if you don't.
The nerdiest way is to clone the repository and pull regularly, but if you're not of the nerd persuasion then you have a few other options:
- If you have an account on GitHub you can watch the repository. Changes to the repo will then appear on your dashboard when you're logged in
- If you don't have an account on GitHub you can bookmark the commits page. New messages there mean new updates to the repo
- You can subscribe to the Atom (like RSS) feed. Any updates to the repo will then appear in your feed reader of choice
While there are now two CSV data files, one for 2015 and one for 2016, there was originally only one file in the Guardian's ZIP file, data/the-counted.csv
. On 4 February 2016 the file was renamed to data/the-counted-2015.csv
. Constraints in the Git version control software means the full commit history isn't available for the new file, but you can see the deleted file's history, until 3 February 2016, on Github. If you're a command-line aficionado you can clone the repo and use git log --follow -- data/the-counted.csv
.