-
Notifications
You must be signed in to change notification settings - Fork 273
prevent inner-link U-turns #83
Conversation
In the result of the Viterbi algorithm ( |
to.getQueryResult().getClosestNode()); | ||
// enforce heading if required: | ||
if (from.isDirected()) { | ||
from.incomingVirtualEdge.setUnfavored(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we need to unfavor the reverse edge to make sure that the router does not take the wrong outgoing virtual edge from the virtual "from" node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes - forgot both directions on the edge. As per new commit?
} | ||
if (to.isDirected()) { | ||
// unfavor the favour virtual edge | ||
to.outgoingVirtualEdge.setUnfavored(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above. We need to make sure that the router does not take the wrong ingoing virtual edge to the virtual "to" node.
Yes, but that's not how the process currently works - as per here, we compute transition probabilities between all possible from timestep candidates to all possible to timestep candidates, regardless of the state of the viterbi algorithm. If you were meaning that we'd need to change the process so that we compute the virterbi next step and then use the output as the from candidate for the next step (hence we only have one and know it's direction), then great - that's what I was meaning with 'online' in the final paragraph above. As an aside, this should (?) also speed things up a lot, as we'll only have one Hmmm ... though I'm not sure the 'online' method will work well, as it won't know a U-turn is about to happen until it's too late ... Aside - new commit passes the test (though still fails others). |
The Viterbi algorithm needs the transition probabilities between all pairs of subsequent candidates as input. After this input is provided for all time steps, the Viterbi algorithm can compute the most likely sequence of candidates (i.e. the sequence of candidates with highest probability). All this input is needed because the probability of a candidate sequence is defined as the product of all transition and emission probabilities within this candidate sequence. If we have t time steps and n candidates per time step (in practice the number of candidates can vary) then we have nt different candidate sequences. Because the Viterbi algorithm uses dynamic programming, it does not store or compute the probabilities of all these different candidate sequences but it is still guaranteed to find the most likely sequence of candidates. The Viterbi algorithm is similar to Dijkstra's algorithm, which also uses dynamic programming to find the shortest path between two nodes A and B out of all possible paths between A and B in the graph.
Only after all GPS positions (and all transition and emission probabilities) have been seen, the Viterbi algorithm decides on the most likely candidate sequence. This is because GPS positions of later time steps can contain information about the correct candidate of the current time step. Imagine a T intersection where we have a GPS position nearby the intersection node and because of GPS noise are not sure yet which of the two possible roads was taken (so which is the correct candidate assuming we have one candidate per real edge). After further GPS positions we can easily decide this. This is why the Viterbi algorithm gives better results than online map matching algorithms if the entire GPS sequence is already known. |
Great - that confirms what I read over the weekend. With that in mind, I'm still not following your comment - if we don't know any final candidates until the end, how can we know the 'actual' direction when we're calculating transition probabilities part-way through? |
Let me use your previous example where we have three GPS position at A, B and C.
Let's assume we only have one real node candidate at position A and C, respectively. For B we have two candidates for the same virtual node, one in north direction and one in south direction. Hence, we have two possible candidate sequences:
The shortest route from A to B_north goes from A to the junction node, to B_north. The shortest route from A to B_south goes from A to the junction node, to the next real node north of B and then back to B_south (assuming that the router does not allow inner-link U-turns). We have similar shortest route paths from B_north/B_south to C. Hence, both candidate sequence are not doing an inner-link U-turn at B but are only permitted to do a U-turn at the next real node north of B. If we had another candidate for B at the junction node then the candidate sequence A -> junction node -> C might have highest probability. @karussell: This raises one question for me: What does actually happen when we make a virtual edge unfavored? Does this effectively remove the edge? If yes then we wouldn't even find a shortest route between A and B_south because the incoming virtual edge to B from the junction node would be missing. So what we need is a way to find the shortest route starting at the beginning of a virtual edge and ending at the end of another virtual edge without removing virtual edges completely. We could also achieve this by routing from/to all real nodes adjacent to virtual node candidates and doing the last part of the route length computation ourselves. However, this requires the computation of multiple routes per transition. |
Yes, or at least puts a penalty on it.
This is possible with the edge based traversal (like how we support turn costs), then we wouldn't need any previous information where we come from, but could use this start edge ID being avoided in the next routing call. (As the edge ID is often virtual this is a bit more complex as we need to find the 'opposite' edge, which should be avoided too, but we could find a solution there too) |
I do not understand this example. If we have two virtual nodes then it would be like
So all virtual edges and nodes are visible to other graph explorations. At least this is what we do when we call QueryGraph.lookup only once (and not separately for every routing) |
Currently I think it's still perfectly viable to go A -> B via B_north, then B -> C via B_south, and hence get an inner-link U-turns. That said, I think I see the key difference in our approaches. In the case
I was thinking when we route @stefanholder - if you know how to do it, then it might be quicker to just cut some code instead of waiting for @karussell and I to catch up. |
For GPS point B we have the two candidates B_north and B_south. For each time step (=each GPS position), the Viterbi algorithm picks exactly one candidate. So each candidate sequence has the same length (3 in this example). But you're right in that the shortest path of the transition B_north -> C also go through B_south.
OK, here is how I will try do it:
@kodonnell, can you please remove the last 3 commits with |
Blindly did what you suggested - hopefully that's OK? If not, it might be better for you to (I assume you've got push permission?) |
OK, no problem. @karussell, when |
yes
The same as with the normal edges, so you can use the DefaultEdgeFilter with outgoing=true to get just the outgoing
This will return the original edge id but converted into the traversal key which considers the direction. BTW: The traversal key stuff is relative simple: you have an edgeID but we want to add the 'heading' so we could just use - and + but instead we use |
To individually check the direction of virtual edges, I can also call |
Yes, under the hood DefaultEdgeFilter does this for you.
yes, exactly. forward and backward is always relative to the direction "base node to adjacent node". |
For the sake of clarity, it's worth pointing out that (I think) |
You are right. We just copy the flags from the original edge into the virtual edge and reverse the flags for the reverse virtual edge. So this means that a "forward" virtual edge can have "backward" access property, but this all does not really matter: the algorithm always has a defined encoder&weighting and should care only about what it sees. Hmmh, maybe I discovered a minor suboptimality when thinking about this: the foot encoder would then return for all 4 virtual edges "access=true". |
So is the direction of a virtual edge always from base to adj node? I thought the base node of an edge is always the node passed to |
Yes, this is correct. |
Then how does one distinguish the direction of the incoming virtual edge from the outgoing, if they've both got the same base/adj node? I thought the two outgoing virtual edges would have base node = x and the two outgoing would have adj node = x ... |
Can you try with the map data at map-data/issue-70.osm.gz instead of a fresh map? I expect that the more recent map variant changes also how things match. |
Still get same results (with smaller node IDs). |
FYI issue-70.osm.gz is just an export from OSM of the region directly surrounding the relevant route. So, while it's locally the same road network, the actual graph in GH may differ, and affect the results. (For example, the edges are likely to experience different contraction, etc.)
I haven't used main before, but the setup is quite different to that of the test. |
Indeed, can you try the command line again and remove the two subnetwork properties from the arguments and increase max_visited_nodes to e.g. 5000?
The default setting is a disabled CH. Furthermore if the input OSM is the same, the graph should be identical including the contraction, see graphhopper/graphhopper#556 |
After removing the two subnetwork property arguments, I got the following exception during the map import of issue-70.osm.gz:
|
Ok, then it was probably not the reason. BTW: the subnetwork properties remove too small islands, and having a 0 there means no removal. But as there are only a few parameters you should be able to identify the differences: weighting (default is fastest), vehicle, max visited nodes, gps accuracy, algorithm (should be bidir dijkstra) and I guess all subnetwork properties are 0 in the tests too., so that they don't remove anything. |
I can also have a look into the cause |
Thanks. I will try first but maybe will get back to you. |
The problem was that I used mm-uturns1.gpx for the command line and issue-70.gpx for the unit test. The GPS coordinates are the same in both files but the in mm-uturns1.gpx all timestamps are the same, whereas in issue-70.gpx timestamps are 60 s apart. For mm-uturns1.gpx (with 60 s timestamp differences) the penalizing of routing paths via |
Ah yes, I vaguely remember doing that, as it wasn't working (and didn't make sense) with all the timestamps the same. As an aside, while thinking up something else, I thought of a different method for completely preventing inner-link U-turns: only route from (real) tower nodes to tower nodes. This will obviously prevent inner-link U-turns (which require virtual nodes) and I think it shouldn't affect map-matching: we'd have to take care with the first/last points of the sequence, but others doesn't really need virtual nodes. E.g. in
with Ti tower nodes and Xi virtual nodes, we get the same map-matched route going from X1->X2->X3 as we do from T1->T2->T3->T4. That's a simplistic example - in the general case we may have to do something like have two candidates for each edge (i.e. the base/adj node) instead of just the single virtual one. (Assuming it is virtual - some may, of course, already be tower nodes.) I can go into more detail if required. That said:
Thoughts? |
If timestamps are the same, then map matching as it is currently implemented doesn't work at all, because then all transitions have zero probability. Actually, it would be better to throw an exception in this case or use the non-normalized metric.
It's an interesting approach, which is simple to implement and should still solve the U-turn problem. However, I would expect that the matching quality gets worse in general because in some cases emission and transition probabilities would be biased:
|
Or, even better, have a customisable 'maximum allowed speed' between events (e.g. 200km/h). This would fail on not only time diff = 0, but also similarly small ones. And, even better, don't throw an exception, but (optionally) break the sequence here and start a new one. At least, this is how I'm doing it currently. I may try and get my code into a PR today.
Those are both good points!
What about using the same distance in the emission probability (i.e. the perpendicular distance of the snapped point to the GPX point). I think that makes sense in the context of HMM emission probabilities.
Maybe we could replace the linear distance between GPX tracks with the linear distance between the nodes? I think this makes sense as currently it's really used to favour straight routes (i.e. the shortest), which would still be true if we used tower-to-tower linear distance. |
We would then have the same emission probability for both tower nodes of the edge and hence the Viterbi algorithm would have no clue where we really are on this edge. Moreover, if we replaced the linear distance between GPS positions with the linear distance between the candidate nodes then we would have very low transition probabilities between both tower node candidates of the same edge no matter where the GPS positions actually are. This would most likely lead to worse map matching results, e.g. going back and forth between the tower nodes of same edge in the map matching result. In general, an HMM tries to find the correct hidden states for known observations. The better our modelling of hidden states reflects reality, the better results we will get. So I think we should stick to directed virtual node candidates. |
We could more easily represent the 'directed nodes' here, if we use the edge based traversal instead (and fix the 4 edges problem with QueryGraph too of course). |
Maybe I should rephrase - in my head, I'm effectively thinking in terms of only allowing complete real edges - that's what I mean by routing between real tower nodes. (Sorry, I think I went down a rabbit-hole in response to your questions.) Anyway, I don't think anything needs to change - we still use the snapped distance in the emission probabilities, and same linear distance - but we simply don't call
I wonder - is there an easy way we can score our algorithm? We've got unit tests, but they're only pass/fail. I've seen in some papers that they compare the map-matched route with the actual route (including by down-sampling the input GPX track) - this would give us a metric for how 'good' our map-matching actually is. And then we could easily settle such matters = ) @stefanholder have you ever done this sort of error quantification? Or know of any relevant data sets? Maybe a new issue (hopefully I'm not raising too many!).
I don't want to sound like a broken record, but as above I'm still not sure a first order HMM with the standard viterbi algorithm will ever be able to handle this. I am most happy to be wrong, and I think it's a good solution if I am = ) |
Instead of nodes we can route with edges and this should solve this. The only problem in my head would be indifferent solutions where the snap is exactly on a junction node. |
Agreed - as long as we use directed edges?
Maybe, in that case, we add all possible edges coming from that junction node, and leave viterbi to choose which is 'best'? |
Good news: I managed to prevent/penalize inner-link u-turns using
The authors of the paper, our HMM map matching approach is based on, provide a GPS test track and the ground truth path, along with a road network in a custom format: http://research.microsoft.com/en-us/um/people/jckrumm/MapMatchingData/data.htm. Moreover, it's easy to create own GPS test tracks and corresponding ground truth paths. I think setting up a map matching quality benchmark deserves a new issue.
This would work but penalizing inner-link U-turns seems to solve most problems. However, we could try this out in a separate PR. |
I just created PR #88 with my changes. I did this as a separate PR because I rewrote the git history of this PR and I didn't want to mess up things. I think we can then close this PR. |
Thanks @stefanholder ! |
Agreed - see #89.
Nice work! Agreed - will close. |
Penalize inner-link U-turns (builds on #83)
See #70
WIP. Things to note (@stefanholder)
I'm still not sure how the logic is going to work. To clarify, consider
with nodes A,B and a GPX position at X (i.e. the virtual node), and virtual edges v1 and v2. What is being done in this PR is creating two candidates for this GPX point: both with closestNode = X, but one with incoming=v1 and outgoing=v2, and the other with that reversed. When we route, we simply unfavor the incoming/outgoing edge to force the other to be used. E.g. if X is the
from
candidate, then we'd havecandidate 1: incoming=v1 (unfavored), outgoing=v2
candidate 2: incoming=v2 (unfavored), outgoing=v1
We can do similar for the
to
candidate but unfavor the outgoing edge.However, how do we actually prevent inner-link U-turns with this? When X is the
to
timestep, we have a candidateand when it's the
from
timestep we have a candidatein other words the inner-link U-turn is still possible:
@stefanholder - how were you planning to get around this when you said:
Since we don't know the direction of the 'right'
to
candidate we can't enforce the 'right' direction to start with.If you agree it's not resolvable, then I think an 'online' (i.e. step by step) solution might be best: we calculate the best sequence up to the give point, and the we can enforce that as the direction at the start of the route to all other candidates.