-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
15 gtfs validation pipeline #20
Conversation
Some CI notes:
|
* Refactored daily summaries * Update github actions inline with dev * docs: Add timestamp format in docstring * docs: Update docstring with route count explanation * chore: Better sentence syntax * Updated breaking change summarise_weekday()->summarise_days() * Add skelentons for route/trip/pre-processing functions * Added the ability to generate route and trip summaries --------- Co-authored-by: r-leyshon <[email protected]>
@ethan-moss I just pushed another small diff that removed some defunct source code - a shallow wrapper around |
@ethan-moss turned out the numpy breaking change was a little more detailed than initially expected. This is the way. Thankfully the thorough test suite catches all these breaking changes and forces you to treat it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job in putting this GTFS module together! This is a great addition to the codebase and I took lots of learnings away from reviewing it:
- The HTML parsing to 'knife and fork' the route types and descriptions is an excellent solution and very well implement - it is a much better method than hard coding values into the codebase. Nicely done!
- Unit testing strategies are exemplary - examples I will definitely learn from and use as a basis going forward.
- Great use of mocking in the unit testing too - I've now learnt about this package for the future as well.
- The defence module is a great idea, looking forward to implementing this across the rest of the codebase - it will be very useful for sure!
- The implementation of
summarise_trips()
andsummarise_routes()
is much more performant than the base implementation - it handled a much larger, London-sized GTFS file with ease!
I performed a review of this PR, considering the whole diff, and using both the test GTFS dataset provided here in the PR and a different GTFS file (a Wales only GTFS dataset, extracted from DfT'S BODS). Overall the gtfs
module and additions to utils
worked as expected. The review has brought up some issues, 'technical debt', and feature requests to raise, but I'll do it separately outside this PR such that any future work resolving them will have a smaller diff - less to PR, making it easier for us all.
The issues are mainly corner case that won't be present during typical usage. I've got fix recommendation for most of the them, and I'll add these to the issues themselves when raising them - it's just a case of implementing these ones. In the very short term, these issues wouldn't present a problem for typical applications - another reason why merging this PR should be OK. There are some issues related to warnings that need to be looked at in more detail though.
The 'technical debt' and feature requests we can progress without for now, and include when needed - same as the population module.
There is an additional GTFS validation feature that we should develop before progressing the codebase further - primarily because gtfs_kit
does not implement this check and it's cropped up in the GTFS data we're using. The feature would be checking for 'fast travel between stops'. This happens when the GTFS timetable is unrealistic/erroneous and the transit vehicle would have to go unrealistically fast to go between stops - example causes could be poor timetabling, human error, and/or the stop has lat/lon info in an incorrect location. This GTFS behaviour could impact our results (may lead to unrealistic transport performance for those trips), so it's best to get a handle on it now. This seems to be a fairly uncommon issue that appears in GTFS data extracts from DfT's Bus Open Data platform, however it is probably common enough that we should look to detect this case as part of the GTFS validation steps - we can at least then understand the impact it may have on any results we generate. We have a working prototype for this feature ready for further development, checks, and implementation. We should be able to add this in as a separate PR soon.
Again, really great work with this PR - as I said above, the module brings in great functionalities and I learnt a lot from reviewing it. Thanks again!
Additional commit after PR review: a merge conflict crept in after merging in the population PR. This was isolated to the |
Description
GTFS validation pipeline. GTFS utility function to filter GTFS feed on bounding box. Utility function internals.
Fixes #15
Motivation and Context
Validate GTFS feeds.
Type of change
How Has This Been Tested?
Test coverage at 93% average.
More testing in marked suites:
Test configuration details:
Advice for reviewer
Checklist:
Additional comments