Project to parse and map 38 million parking tickets from 2003 and 2016. The information itself came from a Chicago Department of Revenue FOIA request.
- Throwing out data is a last resort option.
- Repeatability, always.
- PostgREST + psql views = simple api interface.
- Filetype (eg, json/csv) serialization should be easy.
- SQL first, python second.
- The original data is bombastically messy.
- An address string is the only useful location info.
- Addresses are free-form, with many, many millions of typos.
- Data is semicolon separated, and typos contain semicolons.
- Good address correction is expensive.
- Free/cheap correction isn't very good.
- Addresses are categorized using usaddresses and updated into the table,
ticket_addrs
. - Lat/long is currently not associated with ticket addresses.
- Address suffixes are normalized with lookups from
corrections
table. - A normalized levenstein distance of less than .2 is used for typos correction.
- Rogue quote chars break import of many ticket lines.
- OSM+PostGIS for address lookups in place of data.cityofchicago.org data.
- Implement typo validation using time delta between tickets from ticketers.
- Add timeseries functionality, either through psql extension or influxdb.
- Train usaddresses against Chicago addresses only.
- Improve upon nlevenstein address correction.
- Add url of legal code into
violations
table. - Add cost into
violations
table. - Validate address correction.
- Add indexes to tables.
- Audit whether a ticket should have been given or not.
- Makefile
- Improve documentation.
- Remove my local dependencies.
- Create a web page.
- Trello?
- Docker?
Description | URL |
---|---|
SQL Dump | http://data.mchap.io/tickets/chicago_tickets.latest.sql.gz |
Orig. data, 2003-2008 | http://data.mchap.io/tickets/A50462_TcktsIssd_200303-200812.txt.gz |
Orig. data, 2009-2016 | http://data.mchap.io/tickets/A50462_TcktsIssdSince2009.txt.gz |
Orig. data, 2016-2016 | http://data.mchap.io/tickets/A50462_TcktsIssd_201603-todate.txt.gz |
smartystreets results | http://data.mchap.io/tickets/all_streets_smartystreets.results.json.gz |