Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vp Usable #961

Merged
merged 11 commits into from
Dec 7, 2023
Merged

Vp Usable #961

merged 11 commits into from
Dec 7, 2023

Conversation

amandaha8
Copy link
Contributor

@amandaha8 amandaha8 commented Dec 7, 2023

  • Working on issue Research Request - RT vs Schedule Trip Level Metrics (using vp_usable) #936
  • This is the first version of publishing metrics, starting from vp_usable and trip_speeds.
  • Each row represents a single trip with information such as the median # of GTFS pings per minute, the total trip time, and how accurately the GTFS recorded points matched the shapes we have on file.
  • Moved work from a notebook into a script. The script takes about 20-30 minutes to run for all operators.
  • Used map partitions for only certain functions that are time-consuming instead of wrapping several functions under one map partitions.
  • Began looking at outlier trips: for example, some trips' total time derived from GTFS was much faster to their scheduled service minutes.
  • Next steps: make script run faster, figure out how to deal with outliers.

@amandaha8 amandaha8 merged commit 04aac64 into main Dec 7, 2023
@amandaha8 amandaha8 deleted the vp_usable branch December 7, 2023 22:14
@tiffanychu90
Copy link
Member

tiffanychu90 commented Dec 8, 2023

Feedback for script in e157cca

  • in next draft, work on grouping functions that belong together, such as this one. total_counts and total_counts_by_trip sound basically equivalent, and they are nearly doing the same thing, except total_counts actually creates 2 columns. work on logically grouping or absorbing functions or rewriting functions so the same function can now be used twice.
    • Adapt this function to be used twice
    • Compare it to this to find where they have stuff in common and which part should be removed from the generic function
  • in this function, min_time, max_time are created on the grouped df (vp_usable grouped by trip and binned minute)...I think to be safer, it should be created on vp_usable grouped by trip.
    • The min of a binned minute will always be set to 5:00:00, 5:01:00, etc, which is probably fine, but the max is doing the same...and maybe the max time is 5:10:59, but the binning sets it to 5:10:00.
    • Perhaps the min / max time and ultimately total_trip_time should be generated in a different function, without the binning by minute, and just merge it on.
    • Rename total_trip_time to something like rt_service_minutes or something, because this is the rt equivalent of scheduled service minutes, and we would want both columns present for comparison. Give it the same naming pattern as well as min or minutes so we know it's not seconds or hours, since service_hours what we see in the warehouse.
    • Future TODO: add scheduled service minutes here, and I will move it out of being generated in the trip_speeds dataset, it probably makes more sense to be generated here and used elsewhere
  • There is a function in segment_speed_utils.wrangle_shapes now for making vp into gdf because we do it so often
  • Fix indention of functions for readability, such as here
  • No need to return m1 here since it looks like you're going to export it to GCS anyway. It's not returned when you actually call the function later
  • If you are interested in adding more print / time statements between the various map partitions, that can help you get a sense of what's taking relatively longer, similar to how you had comments in the notebook.

@amandaha8 amandaha8 mentioned this pull request Dec 19, 2023
@amandaha8 amandaha8 linked an issue Feb 6, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Research Request - RT vs Schedule Trip Level Metrics (using vp_usable)
2 participants