Every year, Chicago alders get $1.5 million to spend at their discretion on capital improvements on their ward through the city's Aldermanic Menu Program. Their spending is publicly available in PDF format in the Chicago Capital Improvement Archive. We are in the process of extracting and geocoding this data.
Check out the GitHub issues for things to work on.
- Clone the repo.
- Run the following command in the terminal:
pip install .
Note: When doing development work on the package, you need to re-run this command to use the latest package changes in external scripts.
The repo has two main parts: the data processing Python package and a library of scripts that use the package. If you're a newcomer, we recommend familiarizing yourself with the project by using the scripts to follow the data processing work flow outlined below.
Using the repo scripts, the data processing involves the following steps:
- Extract data from PDFs
- Post-process data (name cleanup, field seperation, categorization)
- Geocode location data
- Identify location format
- Parse location into collection of street numbers or street intersections
- Get GPS coordinates from street numbers and street interesections
- Combine coordinates into point(s), lines, or polygons
- Post-process geo-data
- Interpolate lines and polygons into point clouds for heatmapping
- ward_spending_pdf_data_extraction - converts CIP aldermanic menu spending PDFs into CSVs
- ward_spending_post_processing - post-processes PDF data, making fixes to columns and categorizing items
- ward_spending_geocoding - gecodes the CSV data, outputtinga geoJSON
- bike_geocoding_script - one-off, uses the ward wise libraries to geocode CDOT upcoming bike lane data
- ward_spending.address_geocoding - use to convert location text into geo-coded geometry data
- ward_spending.address_format_processing - use to parse location text into street numbers and street intersections
- geocoder - use to geocode street numbers and street intersections