Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Write Bus Performance metrics to S3 Bucket #451

Merged
merged 5 commits into from
Oct 24, 2024

Conversation

rymarczy
Copy link
Collaborator

This change creates real-time bus performance parquet files and writes them to a partition on the S3 PUBLIC_ARCHIVE_BUCKET.

This process has been successfully run locally (using staging bucket data) and on the dev environment.

Files are generated for each service date with the following steps:

  1. pull GTFS-RT events for service date from RT_VEHICLE_POSITION springboard files
  2. pull Transit Master events for service date from TM/STOP_CROSSING springboard files
  3. Join GTFS-RT and Transit Master events based on trip_id, route_id, vehicle_label, stop_id and closest stop_sequence
  4. Join GTFS Schedule data to events based on exact trip_id or trip_id with the closest trip start time and most number of stop's in common to planned trip service.
  5. Generate bus performance metrics for all events
  6. Write parquet file of bus performance metrics to S3

This PR also contains a linting change that increases minimum line length from 80 to 120 lines, so many files are no-op formatting changes.

Asana Task: https://app.asana.com/0/1205827492903547/1208021735441632

Copy link

Coverage of commit 97efa7e

Summary coverage rate:
  lines......: 75.4% (2489 of 3302 lines)
  functions..: no data found
  branches...: no data found

Files changed coverage rate:
                                                                                     |Lines       |Functions  |Branches    
  Filename                                                                           |Rate     Num|Rate    Num|Rate     Num
  =========================================================================================================================
  src/lamp_py/aws/ecs.py                                                             |27.1%     48|    -     0|    -      0
  src/lamp_py/aws/s3.py                                                              |48.4%    287|    -     0|    -      0
  src/lamp_py/bus_performance_manager/events_gtfs_rt.py                              |79.2%     48|    -     0|    -      0
  src/lamp_py/bus_performance_manager/events_gtfs_schedule.py                        |95.9%     49|    -     0|    -      0
  src/lamp_py/bus_performance_manager/events_tm.py                                   |66.7%     33|    -     0|    -      0
  src/lamp_py/bus_performance_manager/gtfs_utils.py                                  |80.8%     26|    -     0|    -      0
  src/lamp_py/ingestion/compress_gtfs/gtfs_to_parquet.py                             |70.4%     98|    -     0|    -      0
  src/lamp_py/ingestion/compress_gtfs/pq_to_sqlite.py                                |87.8%     49|    -     0|    -      0
  src/lamp_py/ingestion/compress_gtfs/schedule_details.py                            |80.2%     96|    -     0|    -      0
  src/lamp_py/ingestion/config_busloc_trip.py                                        |83.3%     18|    -     0|    -      0
  src/lamp_py/ingestion/config_rt_trip.py                                            |83.3%     18|    -     0|    -      0
  src/lamp_py/ingestion/convert_gtfs.py                                              |79.6%     49|    -     0|    -      0
  src/lamp_py/ingestion/convert_gtfs_rt.py                                           |49.5%    214|    -     0|    -      0
  src/lamp_py/ingestion/converter.py                                                 |88.3%     60|    -     0|    -      0
  src/lamp_py/ingestion/light_rail_gps.py                                            |54.4%     79|    -     0|    -      0
  src/lamp_py/ingestion/utils.py                                                     |61.6%    112|    -     0|    -      0
  src/lamp_py/ingestion_tm/jobs/parition_table.py                                    |45.0%     80|    -     0|    -      0
  src/lamp_py/ingestion_tm/jobs/whole_table.py                                       |75.8%     91|    -     0|    -      0
  src/lamp_py/migrations/versions/performance_manager_prod/sql_strings/strings_001.py| 100%     12|    -     0|    -      0
  src/lamp_py/mssql/mssql_utils.py                                                   |27.5%     69|    -     0|    -      0
  src/lamp_py/performance_manager/alerts.py                                          |53.3%    199|    -     0|    -      0
  src/lamp_py/performance_manager/flat_file.py                                       |82.7%    104|    -     0|    -      0
  src/lamp_py/performance_manager/gtfs_utils.py                                      |91.4%     81|    -     0|    -      0
  src/lamp_py/performance_manager/l0_gtfs_rt_events.py                               |95.6%    135|    -     0|    -      0
  src/lamp_py/performance_manager/l0_gtfs_static_load.py                             |88.1%    135|    -     0|    -      0
  src/lamp_py/performance_manager/l0_gtfs_static_mod.py                              | 100%     33|    -     0|    -      0
  src/lamp_py/performance_manager/l0_rt_trip_updates.py                              |92.9%     70|    -     0|    -      0
  src/lamp_py/performance_manager/l0_rt_vehicle_positions.py                         |98.6%     71|    -     0|    -      0
  src/lamp_py/performance_manager/l1_cte_statements.py                               | 100%     14|    -     0|    -      0
  src/lamp_py/performance_manager/l1_rt_metrics.py                                   | 100%     28|    -     0|    -      0
  src/lamp_py/performance_manager/l1_rt_trips.py                                     | 100%    154|    -     0|    -      0
  src/lamp_py/postgres/postgres_utils.py                                             |76.7%    227|    -     0|    -      0
  src/lamp_py/postgres/rail_performance_manager_schema.py                            | 100%    202|    -     0|    -      0
  src/lamp_py/runtime_utils/alembic_migration.py                                     |94.7%     19|    -     0|    -      0
  src/lamp_py/runtime_utils/remote_files.py                                          | 100%     48|    -     0|    -      0

Download coverage report

Copy link

@arkadyan arkadyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Just one tiny question.

I appreciate the PR description.

In the future, if it's possible to separate out things like this formatting change into an isolated PR or commit, that would make it easier to quickly rubber stamp that bit and better understand which are the functional changes worth looking closer at.

"""
Test that bus routes be generated for a given service date. For the
generated list ensure
* they don't contain Subway, Commuter Rail, or Ferry routes
* don't have a leading zero
* contain a subset of known routes
"""
assert True
exists_patch.return_value = True

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Does this still need the assert?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. That assert is essentially a no-op. My guess is that it was in there as a place-holder before the full test function was written, and then it was never removed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 got it. I also spaced out on the single vs double =.

@rymarczy rymarczy merged commit e194fe8 into main Oct 24, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants