Skip to content

Latest commit

 

History

History
54 lines (33 loc) · 3.05 KB

README.md

File metadata and controls

54 lines (33 loc) · 3.05 KB

Ingestion

Ingestion is an application to transform and aggregate GTFS-RT and GTFS Static files into parquet files for storage in AWS S3 buckets.

Application Operation

Ingestion operates with a chronologic event loop with a 5 minute delay between each iteration.

Ingestion connects to the Performance Manager application via the metadata_log table of the Metadata RDS. When Ingestion creates a new parquet file, the S3 path of that file is written to the metadata_log table for Performance Manager to process.

For each event loop, GTFS Static files are processed prior to any GTFS-RT files, when available.

Event Loop Summary

  1. List all files from incoming S3 bucket
  2. Bucket files into applicable Converter class
  3. Start converter loop of each Converter class, creating parquet files
  4. Write parquet file to S3 Bucket
  5. Write S3 path of parquet file to metadata_log table for Performance Manager
  6. Move successfully processed incoming files to archive bucket
  7. Move un-successfully processed incoming files to error bucket

GTFS Static

GTFS Static Zip files are generated by MBTA for internal and external distribution.

This application converts GTFS Zip files to partitioned parquet files that are exported to an S3 bucket. This is done with the GTFS Converter Class.

GTFS Static parquet files are written to S3 with the following partitioning:

GTFS-RT Data

GTFS-realtime (GTFS-RT) is provided by MBTA as an industry standard for distributing realtime transit data.

The CTD Delta application is responsible for reading GTFS-RT updates from the MBTA V3 API and saving them to an AWS S3 Bucket, as gzipped JSON files, for use by LAMP.

This application aggregates gzipped GTFS-RT update files, saved on S3 by Delta, into partitioned parquet files that are exported to an S3 bucket. The parquet files are partitioned daily, by GTFS-RT feed type. This is done with the GTFS-RT Converter Class

GTFS-RT parquet files are transformed and partitioned based on their Converter Class configuration:

Compressed GTFS Archive Files

GTFS Zip files are converted to yearly partitioned parquet files, using a differential compression process, and exported to AWS S3 for publishing/storage.

For more Information about these files, please see: https://performancedata.mbta.com/