Ingestion is an application to transform and aggregate GTFS-RT and GTFS Static files into parquet files for storage in AWS S3 buckets.
Ingestion operates with a chronologic event loop with a 5 minute delay between each iteration.
Ingestion connects to the Performance Manager application via the metadata_log
table of the Metadata RDS. When Ingestion creates a new parquet file, the S3 path of that file is written to the metadata_log
table for Performance Manager to process.
For each event loop, GTFS Static files are processed prior to any GTFS-RT files, when available.
- List all files from
incoming
S3 bucket - Bucket files into applicable
Converter
class - Start
converter
loop of eachConverter
class, creating parquet files - Write parquet file to S3 Bucket
- Write S3 path of parquet file to
metadata_log
table for Performance Manager - Move successfully processed
incoming
files toarchive
bucket - Move un-successfully processed
incoming
files toerror
bucket
GTFS Static Zip files are generated by MBTA for internal and external distribution.
This application converts GTFS Zip files to partitioned parquet files that are exported to an S3 bucket. This is done with the GTFS Converter Class.
GTFS Static parquet files are written to S3 with the following partitioning:
- GTFS File Type
- timestamp = datetime extracted from
feed_version
column of feed_info.txt, converted to UNIX timestamp
GTFS-realtime (GTFS-RT) is provided by MBTA as an industry standard for distributing realtime transit data.
The CTD Delta application is responsible for reading GTFS-RT updates from the MBTA V3 API and saving them to an AWS S3 Bucket, as gzipped JSON files, for use by LAMP.
This application aggregates gzipped GTFS-RT update files, saved on S3 by Delta, into partitioned parquet files that are exported to an S3 bucket. The parquet files are partitioned daily, by GTFS-RT feed type. This is done with the GTFS-RT Converter Class
GTFS-RT parquet files are transformed and partitioned based on their Converter Class
configuration:
- Busloc Trip Updates
- Busloc Vehicle Positions
- Realtime Vehicle Positions
- Realtime Trip Updates
- Sevice Alerts
GTFS Zip files are converted to yearly partitioned parquet files, using a differential compression process, and exported to AWS S3 for publishing/storage.
For more Information about these files, please see: https://performancedata.mbta.com/