The basics of downloading and cleaning gtfs data.
- Use
python get_agencies.py
(make sure python is installed or whatever other method you like) to snag a list of all of the agencies with GTFS:
wget http://www.gtfs-data-exchange.com/api/agencies -O agencies.json
- Check for the URL of the most recent GTFS feed for your agency of choice
wget http://www.gtfs-data-exchange.com/api/agency?agency=metropolitan-atlanta-rapid-transit-authority -O marta.json
- Download that feed!
wget http://gtfs.s3.amazonaws.com/metropolitan-atlanta-rapid-transit-authority_20140115_0110.zip -O gtfs.zip
- Reasons not to check data
- You really trust the tech folks at the agency
- You’re ok with hitting possible errors
- Reasons to check
- Sometimes people put weird stuff in GTFS
- Nested folders
- pdf license agreements
- Sometimes the data is super messy
- Measure twice, cut once?
- One other reason: https://github.com/OneBusAway/onebusaway-application-modules/wiki/Developer-Guide#common-problems-building-bundle
- Sometimes people put weird stuff in GTFS
This validation tool can be found at https://code.google.com/p/googletransitdatafeed/wiki/FeedValidator
- Download compressed folder
- Untar/unzip
- Change directory and run
cd transitfeed-1.2.12
(make sure the version is correct)
python feedvalidator.py path/to/gtfs.zip
- Errors will pop up in a your web browser if you are using terminal on a local machine and not remote ssh.
New java-based tool in development