-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix timetable generation for StopTimeUpdate containing only arrival or departure #6392
base: dev-2.x
Are you sure you want to change the base?
Conversation
0541650
to
7a96e1a
Compare
7a96e1a
to
f60c1db
Compare
f60c1db
to
863c0cd
Compare
I need discussion of the behavior in filling the missing times. |
application/src/test/java/org/opentripplanner/updater/trip/moduletests/delay/DelayedTest.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Thomas Gran <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev-2.x #6392 +/- ##
==========================================
Coverage 70.10% 70.10%
- Complexity 18269 18274 +5
==========================================
Files 2077 2077
Lines 77659 77674 +15
Branches 7828 7834 +6
==========================================
+ Hits 54439 54453 +14
Misses 20439 20439
- Partials 2781 2782 +1 ☔ View full report in Codecov by Sentry. |
application/src/test/java/org/opentripplanner/updater/trip/moduletests/delay/DelayedTest.java
Outdated
Show resolved
Hide resolved
@@ -312,7 +312,7 @@ public void fixIncoherentTimes() { | |||
stopTimeUpdateBuilder.setStopSequence(1); | |||
stopTimeUpdateBuilder.setScheduleRelationship(StopTimeUpdate.ScheduleRelationship.SCHEDULED); | |||
var stopTimeEventBuilder = stopTimeUpdateBuilder.getArrivalBuilder(); | |||
stopTimeEventBuilder.setDelay(0); | |||
stopTimeEventBuilder.setDelay(900); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this test fail if you don't change it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The time will be valid after my code change because the bus can make up time during its journey.
newTimes.updateDepartureDelay(i, delay); | ||
} | ||
|
||
// propagate arrival and departure times, taking care not to cause negative dwells / hops |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're mixing several features here, aren't you?
Can you please work in small increments? The first PR should only deal with the case where either arrival or departure is missing. In such a case you can use the other one in its place. This shouldn't need any new interpolation logic.
Once we have finished that PR we can look at more complex cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I don't get it then and I think I need to see drawing then.
Using an arrival/departure fallback is, IMO, still a wise first step and something I would like to see first, even if it doesn't fix all of your problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing interpolation logic isn't what you describe. The existing logic is that if the departure time is missing, it is assumed that the bus will always stop for its original dwell time even if it is late (resulting in negative times down the trip), so what I have written and what you are describing are both changing the logic, which is an integral part of this PR.
To explain matter, the following is how OTP fails and my proposed change:
|
Thanks for those tables. I think I get it now. I'm not enthusiastic about making a complicated method even more complicated but if you can write separate module tests for this logic I will soften my position. One point remains: I'm aware that they are not doing anything that is explicitly disallowed by the spec, but they could make everyone's live easier if they just specified the exact arrival and departure time. Do we want to pay the cost of the complexity to allow them to keep doing what they are doing? Lets discuss this in the dev meeting. |
As discussed in the meeting we should provide options for the deployment how to fill in the missing times.
|
Sorry I need help on this one. There is a method I believe that all such interpolation logic should be moved to this method and it should also handle missing stops and stops with |
I have read through this PR and spent some time trying to understand what the underlying goal is here and in issue #6391. I would like to point out that this is reviving a topic that we have revisited many times over the last decade. In the past, OTP developers have repeatedly debated how to handle missing or incomplete realtime data, and reached the conclusion that incoming realtime data should be complete, and the task of repairing missing or low-quality data is outside the scope of OTP. From late 2015 through early 2018 we pushed back on allowing missing fields, and proposed flags that would allow strict validation of complete realtime data sources. See: Before proposing further changes in this area, please be sure you have read all archived past discussions on this topic, and are aware that OTP developers have consistently stated for over ten years that inference or interpolation of missing realtime data is out of scope for OTP. Past experience has shown this is not an area that lends itself to simple and unambiguous quick fixes. This is a topic that has generated hundreds of messages, probably consuming days or weeks of OTP contributors' time, and risks doing the same thing again now. The first order of business here is for the OTP developer community to decide whether they want to re-open this topic. The following questions need to be answered before time is invested in new software development or review:
If OTP decides to accept such data, we will need full, detailed documentation on the intended behavior, in both simple cases and in all situations where data are inconsistent or conflicting. Any submission of code for review is an implicit request to the OTP developer community to anticipate, reopen, and answer all these questions. @miklcct it is important to keep in mind that OTP is not trying to be everything for everyone, or to accept any and all data that could possibly be produced. The development team is finite, so some things need to be simplified or declared out of scope, and some data will have to be dropped and responsibility for quality shifted onto producers or data integrators. I'm aware that the spec says certain things are allowed, but OTP may have to reject valid data. This was a foreseeable outcome that was pointed out at the time the specification was made lenient. Responsibility for inference was shifted from the producers to "someone else", without saying who that was, and the OTP community explicitly stated it could not take on that responsibility. |
@leonardehrenfried please check out the discussions linked above - this problem was anticipated, and the OTP community's conclusion was essentially what you're stating here: producers should just produce complete, unambiguous data. A flag could be added to signal conformity, meeting requirements for backward compatibility. Unfortunately in specifications there's a problem of "skin in the game": decisions are often not made by the organizations that will bear the costs and complexities. It's easy to make a specification more lenient if we assume "someone else" will handle the consequences. The leniency is then purely theoretical because no one agreed to bear the cost, and no one is penalized for not bearing it. |
Isn't our software designed to handle all valid data according to international standards, and used as a ready-to-use backend for any journey planning solution anywhere in the world? For example, in a case of a C complier, if it didn't process trigraphs (an obscure part of standard which no one uses it nowadays and has now been removed) when they were still in the specification, it was considered deficient. A bug would be filed, a test case would be added, and the bug would be fixed. |
Decisions on the scope of OTP are not up to me, as my organization no longer covers its development and maintenance costs. That said, as a long-term contributor I can give my impressions. As I stated above, OTP is not intended to do all things for all people. No, I do not expect it to fully implement the GTFS and GTFS-RT specifications, nor do not expect it to work properly in every situation around the world. In principle it sounds ideal to fully implement a specification. It is more appealing and simpler to understand for users. But OTP has a relatively small developer community, and contributions are generally expected to come from paid full-time developers at organization who can take future responsibility for maintenance and technical debt. The limited amount of available time and energy means that most developers are already fully occupied, and will not have the bandwidth to handle an ever-expanding scope. Developers are definitely going to prioritize the features and behaviors needed by their employers or customers who are investing heavily in OTP. This doesn't just include time they spend directly on implementation, but time they need to spend on review, design, and future maintenance. Within these constraints, a system that consumes all possible input data and works out of the box anywhere in the world is not realistic. So it is reasonable to define a more limited scope, as long as that scope is clear. Implementing a specification partially in a well-defined way may be better than attempting to implement all of it and exhausting all your staff and creating something they don't want to or are unable to maintain. You're probably familiar with the "Pareto principle", one interpretation of which is that the last 20% of the work takes 80% of the time. Exhaustively implementing a specification might take 5 times as long as selectively implementing it and not supporting aspects that aren't needed. Additionally, GTFS and GTFS-RT are expanding over time. GTFS is a community standard, which many OTP developers have contributed to and therefore perceive as living and mutable. The fact that expanding the spec or making it more lenient shifts the cost of sometimes questionable changes onto data consumers has been pointed out, but apparently not always understood or taken seriously. OTP cannot reasonably commit to fully implementing every detail of a specification that can be expanded or revised at any time by people who do not bear the costs.
Some people might consider it deficient. If it's marketed as 100% standards-compliant and sold to customers who follow strict, legally binding purchasing requirements or need to compile source written on an IBM terminal in 1975, then sure. But if someone made a new compiler that offered other advantages and wanted to streamline development by eliminating legacy features, it would be sufficient to state in the documentation that trigraphs are not supported. Furthermore, as a community-driven specification GTFS does not have absolute authority, and does not obligate people who freely and generously share their work with the world to exhaust their resources implementing 100% of its details. I think one mistake here is thinking of OTP as if it were a piece of primarily commercial software being sold to the widest possible market. OTP is being built by a limited number of organizations to serve their specific use cases, and to cover other people's use cases if it's reasonably efficient to do so. If users can build systems with similar design and expectations to the main OTP contributors, it's going to be a lot easier for everyone to work together on the software. Finally I'll say that OTP developers have always been aware that incomplete or inconsistent realtime data are a problem, and knew OTP would need to consume them. But for both resource management and architectural reasons, they decided some separate module should perform the data cleaning and integration, and OTP could then expect to see only complete data. |
I raised this issue at today's developer meeting. Not a technical discussion of this PR's contents, but the general question of whether we were reopening the question of OTP interpolating or repairing realtime messages. The conclusion was that OTP expects complete and clean realtime input data. The need for realtime inference/propagation is recognized, but these should be treated as separate preprocessing steps upstream of data being consumed by OTP. To the extent that people find it convenient to keep this code in the OTP repository, it should be presented as a sandbox feature that is disabled by default. As there will be several different kinds of realtime data repair, including some that are already present in the main codebase, any further development on this subject will require a framework for enabling realtime message cleanup plugins and applying them in an order specified in configuration files. Any organization that wants to expand the realtime data cleaning capabilities will need to commit to future maintenance of the framework for configuring and enabling the plugins. As sandbox features, future maintenance of the realtime data cleaning plugins themselves must also be provided by their authors. If any future refactors of the main codebase cause errors in plugin compilation or tests, the author will be notified but the code will remain commented out until they supply a fix. @miklcct please determine whether you and/or your organization want to take on future responsibility for this plugin system and submit the changes in this PR as sandbox features. |
I agree with @abyrd her. OTP is not ment implement everything in GTFS, GBFS, or NeTEX. These standards define a very open format where there is room for interperation and optional way to do things. OTP will not support everything. An additional point here is that these standards are only describing a data exchange format - they do not describe the ecosystem of components(software) and witch component should be responsible for witch part. The OTP Sandbox Contract is documented her: https://docs.opentripplanner.org/en/latest/SandboxExtension/
Any transformation/changes to data handled to OTP should be possible to turn on/off, preferable be a Sandbox plugin. I some cases organizations running OTP are not allowed to change or inject data.
My standpoint: OTP is NOT responsible for validating and patching/fixing errors - this should be done in another component. The exception here is that we have allowed very basic patching for small deployments with limited resources. From my point of view OTP should focus on passenger travel planning - and leave everything else to other software components in the ecosystem - I would say OTP is already to beeig and should be split, but that is another discussion.
I would say "individual messages" should be dropped. But I am open to alowe this to be configurable PR updater. |
Given the above, my longer term plan is to remove the whole of all propagation code (including backwards propagation) from the core and make sandbox modules for the strategies I need, is that good? |
Yes, the backwards propagation code should be a component that you can plug as well. |
Ok, I will mark this as draft first, then work on another PR to
|
Summary
The GTFS-RT specification does not require both arrival and departure to be provided for a TripUpdate, only either of them is enough. This fixes NEGATIVE_DWELL_TIME or NEGATIVE_HOP_TIME resulting from these updates as they were not processed correctly.
Issue
Closes #6391
Unit tests
Added
Documentation
None needed
Changelog
Bumping the serialization version id
None needed