-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚌 Convert TM data into bus vehicle events #425
Conversation
def create_dt_from_sam( | ||
service_date_col: pl.Expr, sam_time_col: pl.Expr | ||
) -> pl.Expr: | ||
""" | ||
add a seconds after midnight to a service date to create a datetime object. | ||
seconds after midnight is in boston local time, convert it to utc. | ||
""" | ||
return ( | ||
service_date_col.cast(pl.Datetime) + pl.duration(seconds=sam_time_col) | ||
).map_elements( | ||
lambda x: BOSTON_TZ.localize(x).astimezone(UTC_TZ), | ||
return_dtype=pl.Datetime, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a nice utility function to convert seconds after midnight columns into datetime columns.
) | ||
|
||
|
||
def generate_tm_events(tm_files: List[str]) -> pl.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will generate the tm vehicle events dataframe that can be joined with the gtfs rt dataframe to create contain all vehicle events we're aware of.
|
||
|
||
def get_daily_work_pieces(daily_work_piece_files: List[str]) -> pl.DataFrame: | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will generate a dataframe of daily work pieces, who was driving what bus on a given day. there are a lot of caveats though since i can't seam to get the joins to work perfectly. they are noted in the comments.
We want to take Transit Master data from the springboard bucket and convert them into Bus Events that can be joined against GTFS Realtime data. This will give us additional information about when a but arrived at stops as well as when it hits non revenue timepoints. Create a new function that takes a list of Transit Master stop crossing parquet paths and joins it against Transit Master Geo Nodes, Routes, Trips, and Vehicle Tables. Adjust column names, cast them appropriately, and do some modification to make them useable by later stages in the pipeline. Add test files and test cases to ensure ingestion and transformation is happening as expected.
Process Daily Work Piece logs into a dataframe of operator / vehicle records tied to block ids, run ids, and trip ids that can be joined against bus vehicle events.
the remote files runtime utility was updated and merged on main while the tm to bus events branch was completed. after rebasing, this patch wires everything up correctly.
40c2007
to
909796f
Compare
Coverage of commit
|
ead40e0
to
12371c6
Compare
Coverage of commit
|
🚌 Convert TM data into bus vehicle events
Convert Transit Master Stop Crossing parquet files and Daily Piece of Work parquet files into datafarmes describing Vehicle Events and Operator / Vehicle assignments.