You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
clean_feed() is a shallow wrapper around gtfs_kit's clean() function which implements 4 cleaning functions:
clean_ids(): strip whitespace from all string IDs and then replace every remaining whitespace chunk with an underscore
clean_times(): convert H:MM:SS time strings to HH:MM:SS time strings to make sorting by time work as expected.
clean_route_short_names(): In feed.routes, assign ‘n/a’ to missing route short names and strip whitespace from route short names. Then disambiguate each route short name that is duplicated by appending ‘-’ and its route ID. Note: this is the method that fixes the "Repeated pair (route_short_name, route_long_name)" warning.
drop_zombies(): does the following:
Drop stops of location type 0 or NaN with no stop times.
Remove undefined parent stations from the parent_station column.
Drop trips with no stop times.
Drop shapes with no trips.
Drop routes with no trips.
Drop services with no trips.
clean_feed() and hence clean() will fail if there is no shape_id column in trip.txt. However, drop_zombies() is the only one that relies on that column, the 3 other cleaning functions work fine:
(OPTIONAL) Suggested Implementations
Instead of returning without cleaning if shape_id is not present, this could be improved to instead only action the first 3 cleaning methods.
The text was updated successfully, but these errors were encountered:
This issue will not be completed before the end of sprint 4 due to it relying on other PRs that are yet to be merged. This issue should however be completed early on in sprint 5.
Description of the Feature to be Added
clean_feed()
is a shallow wrapper aroundgtfs_kit
'sclean()
function which implements 4 cleaning functions:clean_ids()
: strip whitespace from all string IDs and then replace every remaining whitespace chunk with an underscoreclean_times()
: convert H:MM:SS time strings to HH:MM:SS time strings to make sorting by time work as expected.clean_route_short_names()
: Infeed.routes
, assign ‘n/a’ to missing route short names and strip whitespace from route short names. Then disambiguate each route short name that is duplicated by appending ‘-’ and its route ID. Note: this is the method that fixes the "Repeated pair (route_short_name, route_long_name)" warning.drop_zombies()
: does the following:clean_feed()
and henceclean()
will fail if there is noshape_id
column intrip.txt
. However,drop_zombies()
is the only one that relies on that column, the 3 other cleaning functions work fine:(OPTIONAL) Suggested Implementations
Instead of returning without cleaning if
shape_id
is not present, this could be improved to instead only action the first 3 cleaning methods.The text was updated successfully, but these errors were encountered: